What is clustered index?
(Image Source)
The key difference between clustered indexes and non clustered indexes is that the leaf level of the clustered index is the table. This has two implications.
Non clustered indexes can also do point 1 by using the INCLUDE clause (Since SQL Server 2005) to explicitly include all non-key columns but they are secondary representations and there is always another copy of the data around (the table itself).
The two indexes above will be nearly identical. With the upper-level index pages containing values for the key columns A, B and the leaf level pages containing A, B, C, D
The above quote from SQL Server books online causes much confusion
In my opinion, it would be much better phrased as.
The book's online quote is not incorrect but you should be clear that the "sorting" of both non clustered and clustered indices is logical, not physical. If you read the pages at leaf level by following the linked list and read the rows on the page in slot array order then you will read the index rows in sorted order but physically the pages may not be sorted. The commonly held belief that with a clustered index the rows are always stored physically on the disk in the same order as the index key is false.
This would be an absurd implementation. For example, if a row is inserted into the middle of a 4GB table SQL Server does not have to copy 2GB of data up in the file to make room for the newly inserted row.
Instead, a page split occurs. Each page at the leaf level of both clustered and non clustered indexes has the address (File: Page) of the next and previous page in logical key order. These pages need not be either contiguous or in key order.
e.g. the linked page chain might be 1:2000 <-> 1:157 <-> 1:7053
When a page split happens a new page is allocated from anywhere in the filegroup (from either a mixed extent, for small tables or a non-empty uniform extent belonging to that object or a newly allocated uniform extent). This might not even be in the same file if the filegroup contains more than one.
The degree to which the logical order and contiguity differ from the idealized physical version is the degree of logical fragmentation.
In a newly created database with a single file, I ran the following.
Then checked the page layout with
The results were all over the place. The first row in key order (with value 1 - highlighted with an arrow below) was on nearly the last physical page.
Fragmentation can be reduced or removed by rebuilding or reorganizing an index to increase the correlation between logical order and physical order.
After running
I got the following
If the table has no clustered index it is called a heap.
Non clustered indexes can be built on either a heap or a clustered index. They always contain a row locator back to the base table. In the case of a heap, this is a physical row identifier (rid) and consists of three components (File:Page: Slot). In the case of a Clustered index, the row locator is logical (the clustered index key).
For the latter case if the non clustered index already naturally includes the CI key column(s) either as NCI key columns or INCLUDE-d columns then nothing is added. Otherwise, the missing CI key column(s) silently gets added to the NCI.
SQL Server always ensures that the key columns are unique for both types of indexes. The mechanism in which this is enforced for indexes not declared as unique differs between the two index types, however.
Indexes are used to speed-up query process in SQL Server, resulting in high performance. They are similar to textbook indexes. In textbooks, if you need to go to a particular chapter, you go to the index, find the page number of the chapter and go directly to that page. Without indexes, the process of finding your desired chapter would have been very slow.
The same applies to indexes in databases. Without indexes, a DBMS has to go through all the records in the table in order to retrieve the desired results. This process is called table-scanning and is extremely slow. On the other hand, if you create indexes, the database goes to that index first and then retrieves the corresponding table records directly.
There are two types of Indexes in SQL Server:
A clustered index defines the order in which data is physically stored in a table. Table data can be sorted in only way, therefore, there can be only one clustered index per table. In SQL Server, the primary key constraint automatically creates a clustered index on that particular column.
Let’s take a look. First, create a “student” table inside “schooldb” by executing the following script, or ensure that your database is fully backed up if you are using your live data:
Notice here in the “student” table we have set primary key constraint on the “id” column. This automatically creates a clustered index on the “id” column. To see all the indexes on a particular table execute “sp_helpindex” stored procedure. This stored procedure accepts the name of the table as a parameter and retrieves all the indexes of the table. The following query retrieves the indexes created on student table.
The above query will return this result:
In the output you can see the only one index. This is the index that was automatically created because of the primary key constraint on the “id” column.
Another way to view table indexes is by going to “Object Explorer-> Databases-> Database_Name-> Tables-> Table_Name -> Indexes”. Look at the following screenshot for reference.
This clustered index stores the record in the student table in the ascending order of the “id”. Therefore, if the inserted record has the id of 5, the record will be inserted in the 5th row of the table instead of the first row. Similarly, if the fourth record has an id of 3, it will be inserted in the third row instead of the fourth row. This is because the clustered index has to maintain the physical order of the stored records according to the indexed column i.e. id. To see this ordering in action, execute the following script:
The above script inserts ten records in the student table. Notice here the records are inserted in random order of the values in the “id” column. But because of the default clustered index on the id column, the records are physically stored in the ascending order of the values in the “id” column. Execute the following SELECT statement to retrieve the records from the student table.
The records will be retrieved in the following order:
You can create your own custom index as well the default clustered index. To create a new clustered index on a table you first have to delete the previous index.
To delete an index go to “Object Explorer-> Databases-> Database_Name-> Tables-> Table_Name -> Indexes”. Right click the index that you want to delete and select DELETE. See the below screenshot.
Now, to create a new clustered Index, execute the following script:
The process of creating clustered index is similar to a normal index with one exception. With clustered index, you have to use the keyword “CLUSTERED” before “INDEX”.
The above script creates a clustered index named “IX_tblStudent_Gender_Score” on the student table. This index is created on the “gender” and “total_score” columns. An index that is created on more than one column is called “composite index”.
The above index first sorts all the records in the ascending order of the gender. If gender is same for two or more records, the records are sorted in the descending order of the values in their “total_score” column. You can create a clustered index on a single column as well. Now if you select all the records from the student table, they will be retrieved in the following order:
A non-clustered index doesn’t sort the physical data inside the table. In fact, a non-clustered index is stored at one place and table data is stored in another place. This is similar to a textbook where the book content is located in one place and the index is located in another. This allows for more than one non-clustered index per table.
It is important to mention here that inside the table the data will be sorted by a clustered index. However, inside the non-clustered index data is stored in the specified order. The index contains column values on which the index is created and the address of the record that the column value belongs to.
When a query is issued against a column on which the index is created, the database will first go to the index and look for the address of the corresponding row in the table. It will then go to that row address and fetch other column values. It is due to this additional step that non-clustered indexes are slower than clustered indexes.
The syntax for creating a non-clustered index is similar to that of clustered index. However, in case of non-clustered index keyword “NONCLUSTERED” is used instead of “CLUSTERED”. Take a look at the following script.
The above script creates a non-clustered index on the “name” column of the student table. The index sorts by name in ascending order. As we said earlier, the table data and index will be stored in different places. The table records will be sorted by a clustered index if there is one. The index will be sorted according to its definition and will be stored separately from the table.
Student Table Data:
IX_tblStudent_Name Index Data
Notice, here in the index every row has a column that stores the address of the row to which the name belongs. So if a query is issued to retrieve the gender and DOB of the student named “Jon”, the database will first search the name “Jon” inside the index. It will then read the row address of “Jon” and will go directly to that row in the “student” table to fetch gender and DOB of Jon.
From the discussion we find following differences between clustered and non-clustered indexes.
Database indexes work in the same way. Without indexes you would have to search the whole table in order to perform a specific database operation. With indexes, you do not have to scan through all the table records. The index points you directly to the record that you are searching for, significantly reducing your query execution time.
SQL Server indexes can be divided into two main types:
In this article, we will look at what clustered and non-clustered index are, how they are created and what the main differences between the two are. We will also look at when to use clustered or non-clustered indexes in SQL Server.
Let’s first start with a clustered index.
A clustered index is an index which defines the physical order in which table records are stored in a database. Since there can be only one way in which records are physically stored in a database table, there can be only one clustered index per table. By default a clustered index is created on a primary key column.
Let’s create a dummy table with primary key column to see the default clustered index. Execute the following script:
CREATE DATABASE Hospital CREATE TABLE Patients ( id INT PRIMARY KEY, name VARCHAR(50) NOT NULL, gender VARCHAR(50) NOT NULL, age INT NOT NULL )
The above script creates a dummy database Hospital. The database has 4 columns: id, name, gender, age. The id column is the primary key column. When the above script is executed, a clustered index is automatically created on the id column. To see all the indexes in a table, you can use the “sp_helpindex” stored procedure.
USE Hospital EXECUTE sp_helpindex Patients
Here is the output:
You can see the index name, description and the column on which the index is created. If you add a new record to the Patients table, it will be stored in ascending order of the value in the id column. If the first record you insert in the table has an id of three, the record will be stored in the third row instead of the first row since clustered index maintains physical order.
You can create your own clustered indexes. However, before you can do that you have to create the existing clustered index. We have one clustered index due to primary key column. If we remove the primary key constraint, the default cluster will be removed. The following script removes the primary key constraint.
USE Hospital ALTER TABLE Patients DROP CONSTRAINT PK__Patients__3213E83F3DFAFAAD GO
The following script creates a custom index “IX_tblPatient_Age” on the age column of the Patients table. Owing to this index, all the records in the Patients table will be stored in ascending order of the age.
use Hospital CREATE CLUSTERED INDEX IX_tblPatient_Age ON Patients(age ASC)
Let’s now add a few dummy records in the Patients table to see if they are actually inserted in the ascending order of age:
USE Hospital INSERT INTO Patients VALUES (1, 'Sara', 'Female', 34), (2, 'Jon', 'Male', 20), (3, 'Mike', 'Male', 54), (4, 'Ana', 'Female', 10), (5, 'Nick', 'Female', 29)
In the above script, we add 5 dummy records. Notice the values for the age column. They have random values and are not in any logical order. However, since we have created a clustered index, the records will be actually inserted in the ascending order of the value in the age column. You can verify this by selecting all the records from the Patients table.
SELECT * FROM Patients
Here is the output:
You can see that records are ordered in the ascending order of the values in the age column.
A non-clustered index is also used to speed up search operations. Unlike a clustered index, a non-clustered index doesn’t physically define the order in which records are inserted into a table. In fact, a non-clustered index is stored in a separate location from the data table. A non-clustered index is like a book index, which is located separately from the main contents of the book. Since non-clustered indexes are located in a separate location, there can be multiple non-clustered indexes per table.
To create a non-clustered index, you have to use the “CREATE NONCLUSTERED” statement. The rest of the syntax remains the same as the syntax for creating a clustered index. The following script creates a non-clustered index “IX_tblPatient_Name” which sorts the records in ascending order of the name.
use Hospital CREATE NONCLUSTERED INDEX IX_tblPatient_Name ON Patients(name ASC)
The above script will create an index which contains the names of the patients and the address of their corresponding records as shown below:
Here, the “Record address” in each row is the reference to the actual table records for the Patients with corresponding names.
For example, if you want to retrieve age and gender of the patient named “Mike”, the database will first search “Mick” in the non-clustered index “IX_tblPatient_Name” and from the non-clustered index it will fetch the actual record reference and will use that to return actual age and gender of the Patient named “Mike”
Since a database has to make two searches, first in the non-clustered index and then in the actual table, non-clustered indexes can be slower for search operations. However, for INSERT and UPDATE operations, non-clustered indexes are faster since the order of the records only needs to be updated in the index and not in the actual table.
Now that you know the differences between a clustered and a non-clustered index, let’s see the different scenarios for using each of them.
This is pretty obvious. If you need to create multiple indexes on your database, go for non-clustered index since there can be only one clustered index.
If you want to select only the index value that is used to create and index, non-clustered indexes are faster. For example, if you have created an index on the “name” column and you want to select only the name, non-clustered indexes will quickly return the name.
However, if you want to select other column values such as age, gender using the name index, the SELECT operation will be slower since first the name will be searched from the index and then the reference to the actual table record will be used to search the age and gender.
On the other hand, with clustered indexes since all the records are already sorted, the SELECT operation is faster if the data is being selected from columns other than the column with clustered index.
The INSERT and UPDATE operations are faster with non-clustered indexes since the actual records are not required to be sorted when an INSERT or UPDATE operation is performed. Rather only the non-clustered index needs updating.
Since, non-clustered indexes are stored at a separate location than the original table, non-clustered indexes consume additional disk space. If disk space is a problem, use a clustered index.
Applies to: SQL Server Azure SQL Database Azure SQL Managed Instance
An index is an on-disk structure associated with a table or view that speeds retrieval of rows from the table or view. An index contains keys built from one or more columns in the table or view. These keys are stored in a structure (B-tree) that enables SQL Server to find the row or rows associated with the key values quickly and efficiently.
A table or view can contain the following types of indexes:
Both clustered and nonclustered indexes can be unique. This means no two rows can have the same value for the index key. Otherwise, the index is not unique and multiple rows can share the same key value. For more information, see Create unique indexes.
Indexes are automatically maintained for a table or view whenever the table data is modified.
See Indexes for additional types of special purpose indexes.
Indexes are automatically created when PRIMARY KEY and UNIQUE constraints are defined on table columns. For example, when you create a table with a UNIQUE constraint, Database Engine automatically creates a nonclustered index. If you configure a PRIMARY KEY, Database Engine automatically creates a clustered index, unless a clustered index already exists. When you try to enforce a PRIMARY KEY constraint on an existing table and a clustered index already exists on that table, SQL Server enforces the primary key using a nonclustered index.
For more information, see Create primary keys and Create unique constraints.
Well-designed indexes can reduce disk I/O operations and consume fewer system resources therefore improving query performance. Indexes can be helpful for a variety of queries that contain SELECT, UPDATE, DELETE, or MERGE statements. Consider the query SELECT Title, HireDate FROM HumanResources.Employee WHERE EmployeeID = 250 in the AdventureWorks2019 database. When this query is executed, the query optimizer evaluates each available method for retrieving the data and selects the most efficient method. The method may be a table scan, or may be scanning one or more indexes if they exist.
When performing a table scan, the query optimizer reads all the rows in the table, and extracts the rows that meet the criteria of the query. A table scan generates many disk I/O operations and can be resource intensive. However, a table scan could be the most efficient method if, for example, the result set of the query is a high percentage of rows from the table.
When the query optimizer uses an index, it searches the index key columns, finds the storage location of the rows needed by the query and extracts the matching rows from that location. Generally, searching the index is much faster than searching the table because unlike a table, an index frequently contains very few columns per row and the rows are in sorted order.
1. Clustered Index : Clustered index is created only when both the following conditions satisfy –
Whenever you apply clustered indexing in a table, it will perform sorting in that table only. You can create only one clustered index in a table like primary key. Clustered index is as same as dictionary where the data is arranged by alphabetical order.
In clustered index, index contains pointer to block but not direct data.
Example of Clustered Index – If you apply primary key to any column, then automatically it will become clustered index.
In this example, Roll no is a primary key, it will automatically act as a clustered index. The output of this code will produce in increasing order of roll no.
You can have only one clustered index in one table, but you can have one clustered index on multiple columns, and that type of index is called composite index.
2. Non-clustered Index : Non-Clustered Index is similar to the index of a book. The index of a book consists of a chapter name and page number, if you want to read any topic or chapter then you can directly go to that page by using index of that book. No need to go through each and every page of a book.
The data is stored in one place, and index is stored in another place. Since, the data and non-clustered index is stored separately, then you can have multiple non-clustered index in a table.
In non-clustered index, index contains the pointer to data.
Example of Non-clustered Index –
Here, roll no is a primary key, hence there is automatically a clustered index. If we want to apply non-clustered index in NAME column (in ascending order), then the new table will be created for that column.
Output before applying non-clustered index :
Output after applying non-clustered index :
Row address is used because, if someone wants to search the data for sudhir, then by using the row address he/she will directly go to that row address and can fetch the data directly.
Difference between Clustered and Non-clustered index :
Clustered indexes are indexes whose order of the rows in the data pages corresponds to the order of the rows in the index. This order is why only one clustered index can exist in any table, whereas, many non-clustered indexes can exist in the table.
More Questions
- Aws marketplace subscriptions?
- Do you know best food plots for deer in North Dakota?
- Where art thou brother?
- who voices ikki in legend of korra?
- How to cancel amazon kyc?
- How to learn american sign language alphabet?
- why cote de pablo left ncis?
- How to apply grad plus loan?
- What is the top cd rates in california?
- How to read hlm output?