In the time we've been assisting people with hosting their web applications, the most common reason we've seen for performance problems has been missing or incorrectly used database indexes, so we hope this simple introduction can help people get much more out of their databases.
What are Indexes?
Every time your web application runs a database query containing a WHERE statement, the database server's job is to look through all the rows in your table to find those that match your request. As the table grows, an increasing number of rows need to be inspected each time.
Indexes solve this problem in exactly the same was as the index in a reference book, by taking data from a column in your table and storing it alphabetically in a separate location called an index. The same process can be applied to all data types, for example numeric data will be stored in numeric order and dates in date order.
By doing this, the database does not need to look through every row in your table, instead it can find the data you are searching for alphabetically, then skip immediately to look at the row(s) where the data is located.
Creating a Simple Index
Many web frameworks will assist with the creation of indexes, and it is worth researching how to create indexes using your framework, such as migrations in Ruby on Rails, but in this article I will provide the raw MySQL commands.
In this example, we will be working with a table of students at an extraordinarily large school. This is quite a large table as we have 100,000 students in our school. The first few rows of the table look like this:
|ID||First Name||Last Name||Class|
The table was created using the following query:
CREATE TABLE `students` ( `id` int(11) NOT NULL AUTO_INCREMENT, `first_name` varchar(255) DEFAULT NULL, `last_name` varchar(255) DEFAULT NULL, `class` varchar(255) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB
Our web application performs 3 queries against this table:
- Look up a student by ID
- Search for students by last name
- List all students in a class
Each of those queries searches for data from a single column, so we should ensure that each of these columns has its own index.
The first query
SELECT * FROM students WHERE id = 1 is a special case because it is looking up a row using its
PRIMARY KEY. Because we already know exactly which row we want, no further optimization is required. It is best to always look rows up in this way when possible, and almost all tables should have a unique column defined as the primary key in this way.
The second query
SELECT * FROM students WHERE last_name = 'Smith' will be searching a column which does not currently have an index and this is exactly the type of query that will search the whole table unless we do something about it. Lets go right ahead and create an index on that column:
CREATE INDEX by_last_name ON students (`last_name`);
This creates an index named
by_last_name on the table, which will contain an indexes copy the
last_name column, allowing us to look these up much more quickly.
Exactly the same principle can be applied to the
class column, allowing us to efficiently look up the students in a particular class. We should create a simple index on it as follows:
CREATE INDEX by_class ON students (`class`);
Querying Multiple Columns
What happens when our queries become a little more complicated and we are searching the database for multiple columns? In this case, we are searching for a student named Smith, and we know what class they are in.
SELECT * FROM students WHERE class = '6A' AND last_name = 'Smith'
MySQL will normally only use one index to query the table, so we will not benefit from using both of our existing indexes to perform this query. However we also do not need to make any more indexes at this point!
The database server will look at the table and determine that we have an index on
class and that each
class only contains about 20 students. It will use the
by_class index we have already created to locate all 20 students in the class, then check the
last_name of each row individually. Searching 20 rows is no trouble for the server, compared with searching all 100,000, and we've avoided wasting memory by creating any more indexes.
Indexing Multiple Columns
In our last example, we has small class sizes, but what if we actually had some classes that contained over 100 students? In this case, it would be a good idea to improve our
by_class index to also include the
last_name, using more memory but allowing us to efficiently search on both columns at the same time.
DROP INDEX by_class ON students; CREATE INDEX by_class_and_last_name ON students (class, last_name);
Why did we drop the
by_class index? Because our new index will allow us to search by class and by last name, but it will also allow us to search only by class, rendering the
by_class index redundant.
So why did we keep
by_last_name? Unfortunately, indexes can only be used starting at the beginning. You don't have to use the whole index, but the query has to use its components in order starting with the first.
It is very likely that our application will have some joins. In this example, our students will have some grades in a
grades table as follows:
When viewing a student's record, our web application will fetch all grades related to that student with the following query:
SELECT * FROM grades WHERE student_id = 1
We also have a feature to display all grades from a particular class:
SELECT * from students WHERE class = '6A' JOIN grades on grades.student_id = students.id
Both of these queries use the "foreign key" called
student_id. It is almost always a good idea to index foreign keys, and this is done in the same way as text fields.
CREATE INDEX by_student_id ON grades (student_id);
Choosing Columns to Index
Rather than looking at your queries, it is quite possible to index your database simply by looking through your tables and indexing columns that will obviously be searched on. Here are some examples of columns that should almost always be indexed:
- Columns that join with other tables (they usually end
- Usernames and/or email addresses, these are looked up every time a user logs in.
- Any field used in a URL. Many applications have a
uuidfield for this purpose. Remember that
idshould already be a
If you find your SELECT queries are taking a long time to run, you are most likely missing an index on your table. Look at which columns are searching on, and create an index on each one using the following syntax:
CREATE INDEX by_student_id ON grades (student_id);
It is always wise to start by indexing columns that join with other tables (they usually end
_id). After that, look for columns you know you will commonly search on. Focus on your largest tables.