Document: DS-00006-BF
Database Index Terminology
This primer will guide you down the path to understanding database index terminology
(when you're ready to trudge off into more-technical weeds, Wikipedia offers:
Database Indexes).
What a Database Index is
- A list of key words that you might refer to when trying to search a
database table
- Often, database indexes are organized much like an index at the back of an
ordinary text book
- These indexes are not an actual part of the table
- When you search for and find a key word, it tells you on which page (or pages)
of the table that key word will be found
- This sort of index is called a non-clustered index
- Sometimes the table itself is the index, as in the case of a phone book
- These indexes are an actual part of the table
- They alphabetize the actual table data in some specific order
- This sort of index is called a clustered index
- Tables that don't have a clustered index are known as heap tables
- As with books, some database tables may have more than one index
- Geographical names index
- Biographical names index
- Legal terms index
- Medical terms index
What a Database Key is
- A fancy name for the individual values stored within a unique index
- This term implies different things depending on the context in which it is
spoken:
- Most of the time the word key is misused as a synonym for the whole unique index
- Technically, the word key implies a specific value within a
unique index—better said, key value (to avoid confusion)
Categorizing Database Indexes/Keys by Type
We categorize indexes much like we do all things in life; some categorizations
are relatively official and universally used, others are more abstract or brand
specific: for example, a Chevrolet is a specific brand of car (that comes in
many models or variations), where a big car is widely open to
interpretation. So it is with index terminology.
Terminology we use to describe (categorize) indexes:
- Primary key
- Natural primary key
- Synthetic or contrived primary key* (Often achieved with and AutoNumber
or Identity field)
*Sometimes called a surrogate key, but the term surrogate
has several interpretations, see Wikipedia
Surrogate Key
- Alternate key
- Candidate key
- Secondary index
- Compound index
- Covering index
- Unique index
- Clustered index
Sometimes indexes can be categorized as more than one type at once, much like a
1973 Chevrolet Impala is a big car, and so many adjectives might describe a
single index:
- Primary key
- Unique index
- Compound index
- Covering index
- Clustered index
Unique Indexes
- Simply stated, unique indexes do not allow duplicate values
- Sometimes unique indexes do allow null values
- Some datasbases (SQL Server) allow only one null
value—otherwise the index wouldn't be unique
- Other databases (Oracle) tolerate duplicate null values
- You might choose to put some sort of unique index on any field where you hoped
to prevent duplicate values, for example:
Social Security Number is arguably a good example of a field to uniquely index
Clustered vs. Non-Clustered Indexes
- Some database environments (not all) allow you to control whether the data
itself is the index, or whether the index is build separately at the end of the
table
- We call indexes that are created by way of reordering the raw table data
clustered indexes
- All other indexes are non-clustered indexes
- Tables that have data entered in no particular order are called heap tables
Natural Primary Keys (aren't generally used as primary keys)
A natural primary key is the field (or fields) of data in your table that
uniquely identify every row in the table, for example, in the Employees table:
- SocialSecurityNumber is a natural primary key (two employees are unlikely to
have the same one)
- The combination of FirstName, LastName, and HomePhone is also a natural primary
key (two employees within the same home are unlikely to have the name first and
last name)
As with the above example, tables can (and often do) have more than one natural
primary key
- You should always try to identify the natural primary keys of a table, whether
or not you actually choose to build indexes on these fields (going through this
mental exercise will bear out flaws in your table design)
- Every table should have at least one natural primary key
- Generally, you would build unique indexes on these fields, but would not use
them as your real primary key
- Use of Natural Primary keys as primary keys is generally a bad idea for several
reasons:
- Natural keys tend to take up a lot of bytes of data (Social Security Number is
11 characters long), an so the balloon the size of your database unnecessarily,
and the larger the database, the slower it is—it takes more time to pull
11 bytes of data off your hard disk than it does to extract 4 bytes).
- Technically, use of natural primary keys violates the rules of normalization in
that you must duplicate primary key data to join any tables involved in
one-to-many relationships
Synthetic or Contrived Primary Keys (Sometimes also called Surrogate Keys)
- Generally achieved by using an Identity or AutoNumber field that
increases in value by a known increment every time a new record is added (they
usually increment by one, but some databases will allow you to control this)
- Some database applications manage assignment of key values by automated means of
their own design, rather than relying on the database to assign a unique value
Primary Keys
A primary key is an index based on the field (or column) of data that you choose
to use primarily to join (or connect) records from one table to another—it
should never contain data that would be presented to an end user.
Specifically, a primary key is
- The non-nullable unique index (candidate key) that you selected to be primary
- Usually a contrived key (as opposed to a natural key)
- Can be any data type that can be indexed:
- byte, short (tiny integer), integer, long integer, numeric, decimal, etc.
- character, text, varchar, string, etc.
- date, datetime, smalldatetime
- Can not be a data type that can not be indexed
- Some databases will not index boolean (true/false) data types
- Many databases will not index very large text or binary image data types
- Can be compound (an index comprised of more than one field of data)
Candidate Keys
- These are non-nullable unique indexes that arguably could be used as the primary
key
- Any natural primary key (a concept) that is actually implemented (created as a
non-nullable unique index) is a candidate key
Alternate Keys
Any candidate keys that were not selected as the primary key are known as
alternate keys.
Compound Indexes
- Any index that is comprised of more than one field is called a compound index
—Indexing LastName, FirstName as a single index would be an example of this
- The use of compound indexes is usually (but not always) a bad idea
—This is especially true when the compound index is not a unique index
—The noted exception is when the compound index is also a covering index
- Compound indexes do not have to be unique indexes, but may be
Covering (aka Covered) Indexes
- A covering index is a concept regarding how you use of compound indexes—we don't
actually create indexes of covering type
- A covering index is one where the fields in the index cover all of your data
needs, for example:
- Consider that we have a compound index on FirstName, LastName, and PhoneNumber:
- If you write a query that extracts only FirstName and PhoneNumber, then the
index covered your need
- If you write a query that extracts FirstName, Department, and PhoneNumber, then
the index did not cover your need as the Department field is not carried in the
index
- Why you care: When an index covers your need, the actual table is never queried,
only the index—this is lightning fast
- Remember, we don't actually create covering indexes—how you use a compound index
determines whether or not it covers your need, hence the name covering index
Secondary Indexes
- A secondary index is just a fancy name for any index that you actually create that isn't the primary key
- Secondary indexes can be compound indexes, or not
- Secondary indexes can be unique, or not
- They can allow nulls, or not