Sunday, June 19, 2011

GROUP BY and ORDER By with MySQL

A few weeks ago, I have some good (and at the same time bitter) experience with MySQL that I would like to share.

Requirement: given the auction and bid tables, select user's latest bid for each auction that she joined
Data hosted with ♥ by Pastebin.com - Download Raw - See Original
  1. mysql> desc auction;
  2. +-------------------+------------------+
  3. | Field             | Type             |
  4. +-------------------+------------------+
  5. | id                | int(11)          |
  6. | code              | varchar(64)      |
  7. | created           | timestamp        |
  8. | updated           | timestamp        |
  9. | expires           | timestamp        |
  10. | finished          | tinyint(1)       |
  11. | last_bid_id       | int(11)          |
  12. | starts            | timestamp        |
  13. | current_bid       | int(11)          |
  14. | last_bidder_id    | int(11)          |
  15. +-------------------+------------------+
  16.  
  17. mysql> desc bid;
  18. +------------+------------+
  19. | Field      | Type       |
  20. +------------+------------+
  21. | id         | int(11)    |
  22. | auction_id | int(11)    |
  23. | user_id    | int(11)    |
  24. | created    | timestamp  |
  25. +------------+------------+

Lets look into two auctions of my sample user whose id is 25
Data hosted with ♥ by Pastebin.com - Download Raw - See Original
  1. SELECT * FROM bid WHERE user_id = 25 AND (auction_id = 2012 OR auction_id = 2016);
  2. +--------+------------+---------+---------------------+
  3. | id     | auction_id | user_id | created             |
  4. +--------+------------+---------+---------------------+
  5. | 226458 |       2012 |      25 | 2011-05-02 08:01:32 |
  6. | 226465 |       2016 |      25 | 2011-05-03 06:39:44 |
  7. | 226467 |       2016 |      25 | 2011-05-03 06:39:47 |
  8. +--------+------------+---------+---------------------+

As the target is to get the latest bid for each auction, below is my desired result
Data hosted with ♥ by Pastebin.com - Download Raw - See Original
  1. +--------+------------+---------+---------------------+
  2. | id     | auction_id | user_id | created             |
  3. +--------+------------+---------+---------------------+
  4. | 226458 |       2012 |      25 | 2011-05-02 08:01:32 |
  5. | 226465 |       2016 |      25 | 2011-05-03 06:39:47 |
  6. +--------+------------+---------+---------------------+

And here is the result of my first attempt:
Data hosted with ♥ by Pastebin.com - Download Raw - See Original
  1. SELECT bid.id, bid.auction_id, bid.user_id, bid.created
  2. FROM auction JOIN bid
  3. ON bid.auction_id = auction.id
  4. WHERE bid.user_id =25 GROUP BY auction.id ORDER BY bid.created;
  5. +--------+------------+---------+---------------------+
  6. | id     | auction_id | user_id | created             |
  7. +--------+------------+---------+---------------------+
  8. | 226458 |       2012 |      25 | 2011-05-02 08:01:32 |
  9. | 226465 |       2016 |      25 | 2011-05-03 06:39:44 |
  10. +--------+------------+---------+---------------------+

Oops, something is wrong here (of course, otherwise why on earth am I writing this). The second row should have been the bid made in second 47 not 44. The reason for this error is that GROUP BY is performed before ORDER BY, in other words the ORDER BY doesnt have any effect here.

After googling for a while, I am pretty sure that there is no way to tell MySQL order first, group later. You will have to do the grouping and sorting manually. Below is my query:
Data hosted with ♥ by Pastebin.com - Download Raw - See Original
  1. SELECT b1.id, b1.auction_id, b1.user_id, b1.created
  2. FROM bid b1 LEFT JOIN bid b2
  3. ON b2.user_id = b1.user_id AND b1.auction_id = b2.auction_id AND b1.created < b2.created
  4. WHERE b2.auction_id IS NULL AND b1.user_id = 25;

That's a long and quite complicated query, however the result set is correct

A few days later a colleague looked at my code, saying that it was bad (definitely) and offered some help with this
Data hosted with ♥ by Pastebin.com - Download Raw - See Original
  1. Select bid.id, bid.auction_id, bid_user_id, max(bid.created)
  2. FROM auction JOIN bid
  3. ON bid.auction_id = auction.id
  4. WHERE bid.user_id = 25 GROUP BY auction.id;

However, that... is wrong. Although the processing time is reduced dramatically, this solution provides a wrong result set. You might be deceived if only concentrate on the datetime column. The id column is actually wrong, due to aggregation effect. But the query was successfully trigger a debate among us and finally illuminate a right solution
Data hosted with ♥ by Pastebin.com - Download Raw - See Original
  1. select id, b1.auction_id, created
  2. from bid b1
  3. join (
  4.     select auction_id, max(created) as ts from bid
  5.     where user_id=20 group by auction_id
  6. ) b2
  7. on b1.auction_id=b2.auction_id and b1.created=b2.ts

SHAME ON ME! Realizing that how ugly and ineffective my query is, is really painful. Anyway that is a valuable experience with group by and order by, who knows if later on it turns useful.

1 comment: