We know that Index is really important when we develop an app with database. It can improve our query performance to provide a specific result that we want. A little note here, Indexes are entry points for tables and Indexes are stored in storage (stored separately from the table's main storage a.k.a Heap). PostgreSQL has several types of Index System that we can use, also, we need to know and select the best Index system for our query.
In this article, I'm going to describe some of the PostgreSQL Indexes and its examples, so let's start with the basic.
1. Create Index
First, we create a table and an index.
CREATE TABLE foo(id INTEGER, fullname TEXT, birthdate DATE, address TEXT, phone NUMERIC);
I assume you've provided some data in this table, just create a function or coding to generate a data seed maybe 10K records or even bigger. After that run the EXPLAIN ANALYZE
clause before and after an Index is set. For example:
EXPLAIN ANALYZE SELECT fullname FROM foo WHERE id = 589;
We'll get the result of the execution time, but what will happen after we set an index on that table and run the same query.
CREATE INDEX foo_idx ON foo(id);
Try to run the same explain ANALYZE query above and you will see the different. That's how an index works. For more info about create an index in PostgreSQL, please refer to the official website and do not forget to select the version based on your PostgreSQL version.
2. More About Index
PostgreSQL index has its own file on disk, as I mentioned above, indexes are stored in storage. We can check that by run the following query and command:
SELECT relfilenode FROM pg_class WHERE relname LIKE 'foo_idx';
ls -lrt $PGDATA/
You can see inside that directory for the result that we get from refilenode
value. The same value is the index on the table that we created.
When we create an index, PostgreSQL will lock the table by using CREATE INDEX ....
query, but in case we do not want lock the table, we can add an additional option to the query like CREATE INDEX CONCURRENTLY .....
.
PostgreSQL by default will use BTREE
to create an index. There is a HASH, BRIN, and GIN index. To use one of these index, we need to write it in the query.
Hash index
Hash indexes only handles equality operators =
. For example:
CREATE INDEX hash_idx ON foo USING HASH (fullname);
EXPLAIN ANALYZE SELECT * FROM foo WHERE fullname = 'Test&';
BRIN index (Block Range Index)
This index contains only 3 items: Max value, Min value of column and Page number. It is useful for query that have some connection or large sequentially data, for example querying date.
CREATE INDEX hash_idx ON foo USING BRIN (birthdate);
EXPLAIN ANALYZE SELECT * FROM foo WHERE birthdate > '1990-01-22' AND birthdate < '2000-10-12';
GIN index (Generalized Inverted Index)
I cannot remember when the last time I used this type of index. As far as I know, when we need to index composite values like jsonb
type data (documents and arrays), we use GIN index. For example if you have data:
SELECT name, detail FROM user;
("name": "John Doe", "detail": {"Main Street No. 123", "0888-210-444"}),
....
(50 rows)
And to create an index and anayze it:
CREATE INDEX gin_idx ON user USING GIN (name);
EXPLAIN ANALYZE SELECT * FROM user WHERE name @> '{"name": "John Doe"}';
Please check the official website for more detail.
GiST index (Generalized Search Tree)
This index is preferred text search index type, is it is used for full text search, even if GIN index also can be used to speed up full text searches. I myself never use this index, because I used another full-text search platform instead. But I really want to use this index, I will think of it later :D.
Again, please check the official website for the detail.
Expression index
Querying is not only for id
column, sometimes we also querying with some expressions. For example we want to retrieve the fullname
column only in Uppercase or Lowercase. This kind of expression are accepted by PostgreSQL to be an index.
EXPLAIN ANALYZE SELECT * FROM foo WHERE UPPER(fullname) LIKE 'Test';
CREATE INDEX exp_idx ON foo (UPPER(fullname));
You can also create an index for more complex expression though.
Partial index
Partial index is really useful for querying a specific number of record that we want to retrieve. For example we want to querying data with id
between 500 and 1000.
CREATE INDEX partial_idx ON foo(id) WHERE id >= 500 AND <= 1000;
The index will not selected for id less than 500 and greater than 1000.
Duplicate indexes
Sometimes it happens, we create multiple indexes. I found a query to check how many duplicate indexes in tables.
SELECT indrelid::regclass relname, indexrelid::regclass indexname, indkey
FROM pg_index
GROUP BY relname, indexname, indkey;
SELECT indrelid::regclass relname, indkey, amname,
FROM pg_index i, pg_opclass o, pg_am a
WHERE o.oid = ALL (indclass)
AND a.oid = o.opcmethod
GROUP BY relname, indclass, amname, indkey
HAVING count(*) > 1;
Unused indexes
You've might created an index but end up not using it and forgot to delete it. To see unused indexes, you can simply run the following query:
SELECT relname, indexrelname, idx_scan FROM pg_catalog.pg_stat_user_indexes;
Data types for index
For specific index has their own supported data types. To see that data types that supported by a specific index, run the following query:
SELECT amname, opfname FROM pg_opfamily, pg_am WHERE opfmethod = pg_am.oid AND amname = 'hash';
You can change the name of index on that query to see the list of supported data types for a particular indexes.
Retrieve index status
There's a view that contains information about indexes that we have in our database, check `pg_stat_user_indexes.
That's all of the things about PostgreSQL Indexes that i want to share in this article.
I hope it's useful for you to know and can help you to improve your query by using index. And it's important to remember that indexes are very costly operation and takes space on your system.