Performance for channel with a lot of expired entries
Hi Low,
We're working on an elaborate faceted search setup powered by Low Search. The channel contains an ever-growing number of entries which have a short lifespan, e.g. they have expiration dates set to the near future of when they're created. So while the number of "valid" entries may be in the hundreds, the Channel itself will eventually be filled with thousands of entries.
Would we be wise to set up a cron job to periodically "archive" these entries into a different Channel? Or does Low Search only maintain an index of "valid" entries (when it comes to expiration dates)?
Cheers,
John
Replies
Low 22 Aug 2013 14:47
Hi John,
LS will maintain the collection index of all entries, regardless of anything. If it exists in the DB, it will be indexed. However, having many entries present in the DB index doesn't really matter all that much. Performance will still be speedy. My dataset for testing contains ~30k entries, and that performs well.
Currently, only when an entry is deleted via the CP, it also gets deleted in the search index. If you delete or move it manually, it will stay in the index, even when rebuilding the index.
johndwells 26 Aug 2013 13:03
Thanks for the reply Low. When you say deletion via the CP, what if we used EE's API, would that trigger a re-index?
What if rather than deletion, we were changing the entry's channel assignment, does that trigger a re-index? The question goes for both via CP and API.
It sounds like LS can handle the dataset volume we're working with, but it would still be helpful to know how best to approach maintenance plans down the road. If CP deletion is the only sure route then so be it.
Cheers,
John
Low 26 Aug 2013 13:24
If I recall correctly, all actions done via the CP and API trigger the same hooks: delete_entries_loop and entry_submission_end, so using either one has the same effect.
Entries are only deleted from the index when the delete_entries_loop hook is triggered (deleting entries from CP or API, or deleting a collection).
Entries are updated in the index when they are saved. Changing the channel simply is a 'save', so the entry will remain in the index.
There used to be a way to 'hard rebuild' the index for a collection, which deleted all entries in the index before rebuilding it. However, that was risky; if the data set is large, searches might result in no-results when the site was rebuilding its new index, so I removed the option.
Also, Low Search will automatically optimize the index table after each rebuild.
johndwells 26 Aug 2013 15:12
All very helpful, thanks Low. I'll make a note in the dev docs for future reference.
Cheers!
John