Full table scans for every page using seg2cat
Dear Low,
We have a fairly high-traffic EE site, and as sysadmin and maintenance developer I am doing everything I can to ease our substantial MySQL load.
One of the things I've done is add logging for queries that aren't using indexes to my MySQL slow query log. As soon as I did that I saw a large number of queries from low_seg2cat like this:
SELECT LOWER(cat_url_title) AS cat_url_title, 'cat_id', 'parent_id', 'group_id', 'cat_name', 'cat_description', 'cat_image'
FROM ('exp_categories')
WHERE 'site_id' = '1'
AND LOWER(cat_url_title) IN ('url-segment-1', 'url-segment-2', 'url-segment-3')
AND 'group_id' IN ('24', '17', '5', '7', '30', '3', '22', '28')
This was performing a full table scan for a large percentage of our requests.
I was able to get the query plan down to a narrow result set by adding an index on exp_categories.cat_url_title and changing both of the instances of "LOWER(cat_url_title)" to just cat_url_title.
I hacked my copy of seg2cat to take the LOWER() out of that query and immediately saw the bulk of our problem with full table scans be decimated.
I notice there's an explicit description of the problem in the comment:
// --------------------------------------
// Query database for these segments
// Use lowercase for case insensitive comparison,
// for when DB collation is case sensitive
// --------------------------------------
$this->EE->db->select('LOWER(cat_url_title) AS cat_url_title, '. implode(', ', array_keys($this->fields)))
->from('categories')
->where('site_id', $this->site_id)
->where_in('LOWER(cat_url_title)', $segment_array);
This seems like a very poor tradeoff. MySQL has a whole lot of engineering to support case insensitivity by default, and further, in the common case this will just work, as you have already used strtolower on the user-provided info. Instead of forcing all your users' databases into full table scans every time a user hits a page using seg2cat I suggest simply requiring a case-insensitive collation. Seg2cat will even work the vast majority of the time with a case-sensitive collation anyway!
If you really want to support this case I suggest adding a setting that detects case-sensitive collation and using this expensive query only for those users.
Replies
Low 27 Sep 2012 08:12
Thanks for the tip. The LOWER statement was in place because I got complaints from users where the extension wasn't working because of that. Agreed, those are egde cases. Will look into reverting.