Friday, February 22, 2019

Simple example on why LIMIT is not always the best thing for SQL Database

Since our production database grew to the size where querying data without indices became a suicide mission, we have learned that LIMIT does not always translate into faster query time and less work on the machine. Today at work, we ran into a beautifully simple example to illustrate this point, which was perceived as counter-intuitive by more junior teammates.

Here we have a query over a 8M-row table, the condition fits squarely into an index.

EXPLAIN SELECT "record".*
FROM "record" WHERE ("record"."status" = 'not_found' AND "record"."updated_date" < '2018-12-23 15:48:23.008320+00:00');
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on record  (cost=256890.59..3475874.26 rows=8448978 width=1740)
   Recheck Cond: (((status)::text = 'not_found'::text) AND (updated_date < '2018-12-23 15:48:23.00832+00'::timestamp with time zone))
   ->  Bitmap Index Scan on not_found_records_idx  (cost=0.00..254778.35 rows=8448978 width=0)
         Index Cond: (((status)::text = 'not_found'::text) AND (updated_date < '2018-12-23 15:48:23.00832+00'::timestamp with time zone))
(4 rows)


That was still lot of records, so we were hoping a LIMIT clause would shorten the execution time.

EXPLAIN SELECT "record".*
FROM "record" WHERE ("record"."status" = 'not_found' AND "record"."updated_date" < '2018-12-23 15:48:23.008320+00:00') LIMIT 1;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.46 rows=1 width=1740)
   ->  Seq Scan on record  (cost=0.00..3923250.74 rows=8448978 width=1740)
         Filter: ((updated_date < '2018-12-23 15:48:23.00832+00'::timestamp with time zone) AND ((status)::text = 'not_found'::text))
(3 rows)


Only that it didn't. Adding the LIMIT clause changed the query plan from Bitmap Index Scan to a Sequence Scan, which is about the worst thing we want to do on a 8M-row table. It is a typical "optimization" of the planner, who thought that the query with LIMIT clause was too small.

Operation wise, an index scan is more expensive than a sequence scan. An index scan would require to read the index pages first and then read the data pages for relevant rows. The sequence scan only deal with data pages. And so when the LIMIT is small, combined with the table statistics, the planner is tempted to believe there are enough (random) rows that match the filter condition, which makes a sequence scan cheaper.

In fact, somehow this table statistics suggests that every LIMIT is too small till it is half of the table!

EXPLAIN SELECT "record".*
FROM "record" WHERE ("record"."status" = 'not_found' AND "record"."updated_date" < '2018-12-23 15:48:23.008320+00:00') LIMIT 3000000;
                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..1393038.57 rows=3000000 width=1740)
   ->  Seq Scan on record  (cost=0.00..3923250.74 rows=8448978 width=1740)
         Filter: ((updated_date < '2018-12-23 15:48:23.00832+00'::timestamp with time zone) AND ((status)::text = 'not_found'::text))
(3 rows)

EXPLAIN SELECT "record".*
FROM "record" WHERE ("record"."status" = 'not_found' AND "record"."updated_date" < '2018-12-23 15:48:23.008320+00:00') LIMIT 4000000;
                                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=257234.59..1781198.16 rows=4000000 width=1740)
   ->  Bitmap Heap Scan on record  (cost=257234.59..3476218.26 rows=8448978 width=1740)
         Recheck Cond: (((status)::text = 'not_found'::text) AND (updated_date < '2018-12-23 15:48:23.00832+00'::timestamp with time zone))
         ->  Bitmap Index Scan on not_found_records_idx  (cost=0.00..255122.35 rows=8448978 width=0)
               Index Cond: (((status)::text = 'not_found'::text) AND (updated_date < '2018-12-23 15:48:23.00832+00'::timestamp with time zone))
(5 rows)


Adding an ORDER BY clause into the query brings structure to the search and guide the planner to use the index

EXPLAIN SELECT "record".*
FROM "record" WHERE ("record"."status" = 'not_found' AND "record"."updated_date" < '2018-12-23 15:48:23.008320+00:00') ORDER BY "record"."imported_date" LIMIT 1;
                                                                      QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3518127.15..3518127.15 rows=1 width=1740)
   ->  Sort  (cost=3518127.15..3539249.59 rows=8448978 width=1740)
         Sort Key: imported_date
         ->  Bitmap Heap Scan on record  (cost=256898.59..3475882.26 rows=8448978 width=1740)
               Recheck Cond: (((status)::text = 'not_found'::text) AND (updated_date < '2018-12-23 15:48:23.00832+00'::timestamp with time zone))
               ->  Bitmap Index Scan on not_found_records_idx  (cost=0.00..254786.35 rows=8448978 width=0)
                     Index Cond: (((status)::text = 'not_found'::text) AND (updated_date < '2018-12-23 15:48:23.00832+00'::timestamp with time zone))
(7 rows)

Actually, any random ORDER BY would do

EXPLAIN SELECT "record".*
FROM "record" WHERE ("record"."status" = 'not_found' AND "record"."updated_date" < '2018-12-23 15:48:23.008320+00:00') ORDER BY random() LIMIT 1;
                                                                      QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3539253.59..3539253.60 rows=1 width=1748)
   ->  Sort  (cost=3539253.59..3560376.04 rows=8448978 width=1748)
         Sort Key: (random())
         ->  Bitmap Heap Scan on record  (cost=256902.59..3497008.70 rows=8448978 width=1748)
               Recheck Cond: (((status)::text = 'not_found'::text) AND (updated_date < '2018-12-23 15:48:23.00832+00'::timestamp with time zone))
               ->  Bitmap Index Scan on not_found_records_idx  (cost=0.00..254790.35 rows=8448978 width=0)
                     Index Cond: (((status)::text = 'not_found'::text) AND (updated_date < '2018-12-23 15:48:23.00832+00'::timestamp with time zone))
(7 rows)


However, that is still far from optimal. In order to return the first row, the database first needs to fetch all rows matching the index condition, and only after that, sorts. This creates strains on the machine memory and maybe even disk swap.

Index is actually ordered structure (BTree). The most optimal query is the one using one of the indexed columns for sorting.

EXPLAIN SELECT "record".*
FROM "record" WHERE ("record"."status" = 'not_found' AND "record"."updated_date" < '2018-12-23 15:48:23.008320+00:00') ORDER BY "record"."updated_date" LIMIT 1;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..1.90 rows=1 width=1740)
   ->  Index Scan using not_found_records_idx on record  (cost=0.56..11280384.47 rows=8448978 width=1740)
         Index Cond: (((status)::text = 'not_found'::text) AND (updated_date < '2018-12-23 15:48:23.00832+00'::timestamp with time zone))
(3 rows)

This, in fact, is even more effective that the original no-order query, with or without LIMIT.

3 comments:

  1. What's your PosgresSQL version? Could you please provide the table schema? I have an opposite result on my db.

    ReplyDelete
    Replies
    1. I was on Postgres 10.6

      The schema is following

      Column | Type | Modifiers
      -----------------------------------+--------------------------+---------------------------------------------------------
      id | integer | not null default
      updated_date | timestamp with time zone | not null
      status | character varying(100) | not null
      Indexes:
      "not_found_parcels_idx" btree (status, updated_date)

      Delete
  2. Hi anh, thanks for your post. I have some questions below:
    (1) Query with `Seq Scan`
    (2) Query with `Index Scan`
    - How do you know the query (2) is faster (or as "more effective" in the post) than query (1)? Because the number of rows output and rows' width are the same, but the cost in query(2) is much higher than query(1)
    - Do you have a version with `EXPLAIN ANALYZE` as it would have the execution time?
    - And did you VACUUM ANALYZE the table before explain query since it would make help update the table statistics (also clean up the index tree), hence the planner would make right decision.

    ReplyDelete