PostgreSQL & PHP Tutorials - Using Explain - The Syntax

PostgreSQL »  Using Explain

Posted By Chris Smith Posted on 05 Mar 2006, 07:31 AM
Viewing page 2 of 5
« | Back | Next | »

First, we'll check the syntax of explain commands:

indextest=# \h explain
Command: EXPLAIN
Description: show the execution plan of a statement
Syntax:
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement


Explain by itself will just show the plan that the query will use.

Explain Analyze will show the plan and also run the query so you can compare the plan to the results the query finds. This will help determine whether a statistical update is needed (that is, whether you need to run 'analyze' or 'vacuum analyze' on the tables in the query), or if the planner is making a bad decision.

Explain verbose is usually not needed, but it shows a lot more detail on how the planner decides which way to construct the query. You should only need to use this if asked on a mailing list.

First we'll look at the basic command:

indextest=# EXPLAIN SELECT * FROM indextest;
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on indextest (cost=0.00..86765.00 rows=5000000 width=25)
(1 row)

This looks complicated, but it's not. Let's have a look at each area.

Seq Scan means PostgreSQL has to do a sequential or full table scan to return the results. This makes sense - we're not returning a small result from the table, we're returning everything so every row needs to be looked at.

The cost=0.00..86765.00 message has two parts.

The first part (0.00) is the estimated time before a result can be output. If you add an order by to a query, this is the value that will change as we will see.

The second value is the cost to retrieve the rows or records we want. This doesn't take limiting results into account.

These costs are a measure of how many disk page fetches it will take to retrieve the data. A disk page fetch is when PostgreSQL looks at the data on disk, so in this case it will have to look at the disk approximately 86,765 times - this sounds like an awful lot but in reality it's not that much - remember it's examining 5 million rows too.

The rows=5000000 value is the estimated number of rows that will be output by this query.

The last value (width=25) is the average width of each row of data in bytes, or the total number of bytes each result contains.


Let's change our query and have another look at the explain output:

indextest=# EXPLAIN SELECT * FROM indextest WHERE id < 1000000;
QUERY PLAN
-------------------------------------------------------------------------------------------
Index Scan using unique_id_index on indextest (cost=0.00..22269.57 rows=966338 width=25)
Index Cond: (id < 1000000)
(2 rows)


The results are totally different.

This time, PostgreSQL will use the index (Index Scan) and it's going to use the unique_id_index on the table.

The cost of ordering the results is still 0.

The number of times PostgreSQL will look at the disk is approximately 22,269.57 times - significantly less than the 86,000 previously seen.

This query will return around 966,338 rows and each row contains 25 bytes of data.

Why isn't PostgreSQL returning 999,999 rows, it says there are only going to be 966,338 rows? These numbers are only estimates of the actual results PostgreSQL will find and use. The vacuum analyze command that was run on the table updated internal statistics that PostgreSQL uses to work out values for the planner. When PostgreSQL works out what sort of query plan to use and therefore the displayed figures for explain commands, it chooses a random set of data from this cache to work out whether an index is used, or if the whole table needs to be looked at.
Viewing page 2 of 5
« | Back | Next | »


Related Articles:
Avg Rating: 4
Vote Count: 4


              


Want to post a comment? Fill in the details below.

Your Name  : 
Your Email  : 
Your Website  : 
Spam Check! Please answer this question  : 4 + 5 =
Comment