Build Index fails on only one collection, due to content, it seems....
I am able to build the indexes for all of my collections except for one. When I try, it gives me the error 'An error occurred building the index.' with no additional information.
I added a boatload of debugging log messages to find out where it was failing, and discovered it was happening on one certain entry, when it got to line 339 of low_search_words.php "$str = preg_replace('/(\W|\n|\r|_)+/u', ' ', $str);".
I tried, just on a whim, to remove all of the content from the main body field for that entry, and LOW and behold, the problem went away!
So... can you help me figure out why this specific content in my page body field was causing this to crash? Here is the exact content that was in that field:
http://pastebin.com/xDc6W7zW
Replies
Gary Reckard 6 Jun 2015 23:53
Here is how I have my collection configured, if that helps..
Gary Reckard 7 Jun 2015 00:02
OK, did a little more debugging, and I think I found the issue... or a clue.
I tried removing various chunks out of that piece of content and re-trying the index and found something interesting. That content that is breaking this thing has a lot of indentation. If I try using a version of that same content, but first remove all of the indentation (tabs), things work!
So, the line "$str = preg_replace('/(\W|\n|\r|_)+/u', ' ', $str);" is crashing when it is passed a string with too many tabs... Is this a server/memory thing? Can it be avoided?
Low 7 Jun 2015 08:49
So there's no error message shown? You can also try and trigger the rebuild of that entry itself by using the ACT url with entry_id=x in it, where x is the entry id. That might show an error message so we know what exactly is the problem.
Gary Reckard 7 Jun 2015 15:10
The only error message I get in the control panel is what you see in the attached image.
When I tried accessing the build method through the ACT url like you suggested for that one entry, I just get a browser error after a little bit of time. I have all error reporting turned on.
I then tried to make the simplest script I could to re-create the error.
http://pastebin.com/ZFQLZrGB
It seems these two lines are enough to cause my machine to have a heart attack, when that one string is passed to them.
$str = strip_tags($str);
$str = preg_replace('/(\W|\n|\r|_)+/u', ' ', $str);
This causes the same issue on my local machine... however, when I ran this on a random Nexcess server I have access to, it did NOT fail... So, the issue is not universal, but environment dependent.
BUT! I did find, that adding this line:
$str = implode("\n", array_map('trim', explode("\n", $str)));
at saaaay line 335 of low_search_words.php, it eliminates the issue on my computer.
So I'm going to just add that to your file and go on my way... but maybe you could add it as well, so if I upgrade, this one site doesn't break on my local machine? Thanks for your help, prompt replies, and awesome add-ons!
Low 8 Jun 2015 08:25
I thought it might be environmental. That's hard to debug. Adding that line might help but isn't ideal for in the core. I'd make a note of your fix to keep track of it. I might tweak the algorithm, which might help, but keep your code just to be sure.
Paul Cripps 23 Feb 2017 11:25
We're getting the same, was this ever resolved?
...also, how would we find what entry is causing the issue? We've got 2K entries in our collection!