Document: DS-00006-BF

Database Index Terminology

This primer will guide you down the path to understanding database index terminology
(when you're ready to trudge off into more-technical weeds, Wikipedia offers: Database Indexes).

What a Database Index is

  • A list of key words that you might refer to when trying to search a database table
  • Often, database indexes are organized much like an index at the back of an ordinary text book

    • These indexes are not an actual part of the table
    • When you search for and find a key word, it tells you on which page (or pages) of the table that key word will be found
    • This sort of index is called a non-clustered index

  • Sometimes the table itself is the index, as in the case of a phone book

    • These indexes are an actual part of the table
    • They alphabetize the actual table data in some specific order
    • This sort of index is called a clustered index
    • Tables that don't have a clustered index are known as heap tables

  • As with books, some database tables may have more than one index

    • Geographical names index
    • Biographical names index
    • Legal terms index
    • Medical terms index

What a Database Key is

  • A fancy name for the individual values stored within a unique index
  • This term implies different things depending on the context in which it is spoken:

    • Most of the time the word key is misused as a synonym for the whole unique index
    • Technically, the word key implies a specific value within a unique index—better said, key value (to avoid confusion)

Categorizing Database Indexes/Keys by Type

We categorize indexes much like we do all things in life; some categorizations are relatively official and universally used, others are more abstract or brand specific: for example, a Chevrolet is a specific brand of car (that comes in many models or variations), where a big car is widely open to interpretation. So it is with index terminology.

Terminology we use to describe (categorize) indexes:

  • Primary key

    • Natural primary key
    • Synthetic or contrived primary key* (Often achieved with and AutoNumber or Identity field)
      *Sometimes called a surrogate key, but the term surrogate has several interpretations, see Wikipedia Surrogate Key

  • Alternate key
  • Candidate key
  • Secondary index
  • Compound index
  • Covering index
  • Unique index
  • Clustered index

Sometimes indexes can be categorized as more than one type at once, much like a 1973 Chevrolet Impala is a big car, and so many adjectives might describe a single index:

  • Primary key
  • Unique index
  • Compound index
  • Covering index
  • Clustered index

Unique Indexes

  • Simply stated, unique indexes do not allow duplicate values
  • Sometimes unique indexes do allow null values

    • Some datasbases (SQL Server) allow only one null value—otherwise the index wouldn't be unique
    • Other databases (Oracle) tolerate duplicate null values

  • You might choose to put some sort of unique index on any field where you hoped to prevent duplicate values, for example:

    Social Security Number is arguably a good example of a field to uniquely index

Clustered vs. Non-Clustered Indexes

  • Some database environments (not all) allow you to control whether the data itself is the index, or whether the index is build separately at the end of the table
  • We call indexes that are created by way of reordering the raw table data clustered indexes
  • All other indexes are non-clustered indexes
  • Tables that have data entered in no particular order are called heap tables

Natural Primary Keys (aren't generally used as primary keys)

A natural primary key is the field (or fields) of data in your table that uniquely identify every row in the table, for example, in the Employees table:

  • SocialSecurityNumber is a natural primary key (two employees are unlikely to have the same one)
  • The combination of FirstName, LastName, and HomePhone is also a natural primary key (two employees within the same home are unlikely to have the name first and last name)

As with the above example, tables can (and often do) have more than one natural primary key

  • You should always try to identify the natural primary keys of a table, whether or not you actually choose to build indexes on these fields (going through this mental exercise will bear out flaws in your table design)
  • Every table should have at least one natural primary key
  • Generally, you would build unique indexes on these fields, but would not use them as your real primary key
  • Use of Natural Primary keys as primary keys is generally a bad idea for several reasons:

    • Natural keys tend to take up a lot of bytes of data (Social Security Number is 11 characters long), an so the balloon the size of your database unnecessarily, and the larger the database, the slower it is—it takes more time to pull 11 bytes of data off your hard disk than it does to extract 4 bytes).
    • Technically, use of natural primary keys violates the rules of normalization in that you must duplicate primary key data to join any tables involved in one-to-many relationships

Synthetic or Contrived Primary Keys (Sometimes also called Surrogate Keys)

  • Generally achieved by using an Identity or AutoNumber field that increases in value by a known increment every time a new record is added (they usually increment by one, but some databases will allow you to control this)
  • Some database applications manage assignment of key values by automated means of their own design, rather than relying on the database to assign a unique value

Primary Keys

A primary key is an index based on the field (or column) of data that you choose to use primarily to join (or connect) records from one table to another—it should never contain data that would be presented to an end user.

Specifically, a primary key is

  • The non-nullable unique index (candidate key) that you selected to be primary
  • Usually a contrived key (as opposed to a natural key)
  • Can be any data type that can be indexed:

    • byte, short (tiny integer), integer, long integer, numeric, decimal, etc.
    • character, text, varchar, string, etc.
    • date, datetime, smalldatetime

  • Can not be a data type that can not be indexed

    • Some databases will not index boolean (true/false) data types
    • Many databases will not index very large text or binary image data types

  • Can be compound (an index comprised of more than one field of data)

Candidate Keys

  • These are non-nullable unique indexes that arguably could be used as the primary key
  • Any natural primary key (a concept) that is actually implemented (created as a non-nullable unique index) is a candidate key

Alternate Keys

Any candidate keys that were not selected as the primary key are known as alternate keys.

Compound Indexes

  • Any index that is comprised of more than one field is called a compound index
    —Indexing LastName, FirstName as a single index would be an example of this
  • The use of compound indexes is usually (but not always) a bad idea
    —This is especially true when the compound index is not a unique index
    —The noted exception is when the compound index is also a covering index
  • Compound indexes do not have to be unique indexes, but may be

Covering (aka Covered) Indexes

  • A covering index is a concept regarding how you use of compound indexes—we don't actually create indexes of covering type
  • A covering index is one where the fields in the index cover all of your data needs, for example:
  • Consider that we have a compound index on FirstName, LastName, and PhoneNumber:

    • If you write a query that extracts only FirstName and PhoneNumber, then the index covered your need
    • If you write a query that extracts FirstName, Department, and PhoneNumber, then the index did not cover your need as the Department field is not carried in the index
    • Why you care: When an index covers your need, the actual table is never queried, only the index—this is lightning fast

  • Remember, we don't actually create covering indexes—how you use a compound index determines whether or not it covers your need, hence the name covering index

Secondary Indexes

  • A secondary index is just a fancy name for any index that you actually create that isn't the primary key
  • Secondary indexes can be compound indexes, or not
  • Secondary indexes can be unique, or not
  • They can allow nulls, or not

[Home]
611 Potomac Place • Suite 101 • Smyrna, Tennessee 37167 • 615.223.6789 • ContactUs@DataSchenk.com
Classes • Training • Nashville • © 1998 - 2010 DataSchenk, Inc.  All Rights Reserved 
We Offer • SQL Server Training • Exchange Server Training • C Sharp Training • VB.NET Training • Javascript Training
 • SSAS Training • SSIS Training • SSRS Training • Project Training • SharePoint Training • Access Training • Excel Training

We have Customers in • Decatur, AL • Hampton Cove, AL • Harrison, AR • San Bernadino, CA • San Diego, CA • Patrick AFB, FL
• Kennesaw, GA • Lawrenceville, GA • Carbondale, IL • Taylorville, IL • Lafayette, IN • Lenexa, KS • Campbellsville, KY
• Paducah, KY • Lafayette, LA • Jackson, MI • Jackson, MS • Cashiers, NC • Fairview, NC • Raleigh, NC • Washington, NC
• Keene, NH • Carlsbad, NM • Reno, NV • Canby, OR • Kennett Square, PA • San German, PR • Carthage, TN
• Gallatin, TN • Hendersonville, TN • Jackson, TN • La Vergne, TN • LaVergne, TN • Lawrenceburg, TN • Lebanon, TN
• Madison, TN • Manchester, TN • Martin, TN • McMinnville, TN • Memphis, TN • Nashville, TN • Oak Ridge, TN
• Paris, TN • Parsons, TN • Tazewell, TN • Watertown, TN • San Antonio, TX • Manchester Center, VT • Racine, WI

KILROY - Database-Indexes