When you work as either a SQL Server database administrator or developer, you hear a lot about indexes. So what is an index and what is it purpose? An index is created on a data table or view and is used to quickly locate data without having to search every row within a data table.
While they can improve the amount of time it takes to search through your data, an index does require its own storage space. When an index is created on a table, the data table is not changed. Instead, a new data structure that refers to the original table is created. This requires additional storage space so users need to consider storage vs speed when adding new indexes.
An index is made up of a set of pages that are organized in a B-tree structure. This is a hierarchical structure that contains three levels: Root, Intermediate, and Leaf. All of the data is hosted on the Leaf level. The key values and pointers are contained in the Root and Intermediate levels.
An index can include one or more columns for the specific table that you are working with. Without an index, a query search goes through all of the records in the database to retrieve only those records that are needed. With an index, the search is on the index first to find the desired records and the corresponding records are retrieved.
An index can contain several columns, but not all data types can be used within an index. Indexes can be created on most columns within a table or view except for those that are of image, ntext, or text data type. Columns of varchar(max), nvarchar(max), varbinary(max) and xml data type are only allowed in a non-clustered index as a non-key index column.
There are two basic types of SQL Server indexes: clustered and non-clustered. A clustered index defines the order in which the data is sorted within the table. Since the data can only be sorted one way, there can only be one clustered index per table. If there is a primary key in your table, it automatically creates a clustered index on that particular column.
Unlike the clustered index, the non-clustered index does not sort the data within the table. Since the non-clustered index does not define the sort order, there can be more than one non-clustered index on a table. Depending on the version of SQL Server you are using, you can have 999 non-clustered indexes in SQL Server. This is true if you are using SQL Server 2008 R2 or greater, which most people are now using
Let’s look at the impact that an index can have on a relatively small, simple table. For this example, I have made a copy of the Purchasing. PurchaseOrderLines table in the WideWorldImporters database named PurchaseOrderLine_TestIndexing. We will run the following query against the table with different indexes on the StockItemID field and look at the results.
SELECT StockItemID FROM [Purchasing].[PurchaseOrderLine_TestIndexing] WHERE StockItemID = 77
For the first run of the query, there are no indexes on the table. Because we have no indexes, a table scan was completed to find the requested data. That means resources were used to go through the entire table to find the records where StockItemID = 77. This table does not have a lot of records so the impact was not as intensive as a table with a lot more records, but it is a consideration. The Estimated Subtree Cost is 0.131745 and it took 0.121s. The Estimated Subtree Cost is the SQL Server cost to execute the query. The lower the Estimated Subtree Cost number, the less resource intensive the query is for SQL Server.
For the next example, we added a non-clustered index to the StockItemID field. This time, an Index Seek was completed against the table instead of a Table Scan, the Estimated Subtree Cost was reduced to 0.005226, and the amount of time it took to run the query was reduced to 0.101s. Since the query searched through the Index instead of the entire table, it was a more efficient use of resources to find the associated records.
For the third run of the query, we removed the non-clustered index and added a clustered index to the StockItemID field. This time a Clustered Index Seek was done, the estimated Subtree Cost and time spent to run the query was close but just slightly higher than with a non-clustered index on the same field.
The clustered and non-clustered index provided similar results for our second and third examples. You are allowed only one clustered index per table, this might not be the appropriate field for a clustered index. For the PurchaseOrderLine_TestIndexing table, a non-clustered index on the StockItemID field is the most appropriate type of index.
I hope this gives you a good overview of the basics of SQL Server indexes. Over the next few weeks we will explore the different types of indexes in SQL Server and the uses for each type.