Letter-limit and norwegian characters
Found a very strange bug/unexpected behavior:
I use Tagger to tag an entry with Norwegian municipalities (around 450!), and Low Search to allow searches on the tags. One entry may contain up to 10 municipal tags.
Problem 1: the tags beginning with Æ,Ø,Å returns a result ONLY when used exactly like the tag (Ålesund works, but not ålesund | Åmli works, but not åmli).
Problem 2: A lot of the tags are only two or three characters long. And none of the 2-3 chars searches returns any results. Where do I set limits to word lengts/letter-limits?
Replies
Low 17 Sep 2012 10:16
Hi Atle,
LS uses the file /system/expressionengine/config/foreign_chars.php to "translate" those characters to regular (unaccented) characters. Any characters not in there will remain as is in the collection index. It mostly contains lowercase letters, as it's mainly used for url-titles.
As for the second problem -- there is no real limit to word lengths or letter limit. A fallback search method is executed if the search term contains stop words or any of the words fall below the MySQL min-word-length threshold. If you turn on template debugging and the output profiler, you can see what LS is doing. Can you paste the lines starting with Low Search here? Then I can see what's going on behind the scenes.
atlelill 17 Sep 2012 11:50
Hi Low, thanks for the (ultra)quick reply!
I found the entity for "å" in foreign_chars.php, and when I commented out that line, it worked!
As for the output profiler:
(0.043885 / 7.41MB) Low Search: form tag - no or invalid query given
(0.075936 / 8.55MB) Low Search: form tag - no or invalid query given
(0.341384 / 15.96MB) Low Search: starting search (fulltext)
(0.341944 / 15.96MB) Low Search: Searched but found nothing. Returning no results.
I'm outputting the results on the same page as the search form. That's probably why the duplicate first lines.
Looks cool to me... All searches with 3 or 2 letters return nothing. As soon as there's 4 letters or more I get a result.
In the settings I've deleted all the stop-words (in case that was the problem...) and set the "Minimum word length" to 1.
Low 17 Sep 2012 12:02
You actually might be better off adding a replacement for Å in foreign_chars.php; something like this:
Same goes for any other uppercase characters that might need converting.
The minimum word length and stop words are settings in MySQL that I cannot change in the add-on, and should be reflected in the settings. You're better off leaving the defaults as they are. If you're not getting any results, something else might be in play.
atlelill 17 Sep 2012 12:07
So I should check out the database's MySQL-settings for the word lenghts?
I tried to add the replacement for Å, but it didn't work, so I'll stick with commenting out "å". Seems to help.
Thanks again for the quick responses, am very happy with the addon!
Low 17 Sep 2012 12:28
Oh sorry, forgot to mention you should rebuild the indexes once you made that change to foreign_chars.php.
If you're only going to get Norwegian searches, clearing out the stop words is okay. However, setting the minimum word length to a lower value than it actually is in MySQL could lead to finding no results when actually there are. The fallback search method takes care of that and is triggered when the search term contains a word that falls below that threshold.
If you're not getting results then (when you should), we should probably take a look what's happening there. Unless you don't want to, of course. :)
atlelill 18 Sep 2012 11:15
Thanks so much for the replies, they've cleared up a lot for me. I put the replacement back into the foregin_chars-file, rebuilt the index, and it works fine.
The other issue was with the MySQL-settings on the server, I've got a gridserver on MediaTemple, and had to buy a service to let me fiddle around in my.cnf-file to set the minimum word length. As soon as I had done that, rebuild and rebooted my server, build the Low Search index anew: WOOOOM!BANG! I can now search in all the cute, little Norwegian municipalities!
JohanHedin 20 Feb 2013 14:35
I have basicly the same problem (swedish). However, I don't get it to work. My problem is as follows:
If I search for Ängra or Ödevata I get a result (they are names, so they are spelled with capital letters). If I search for angra or odevata I get a result, but when I search for ängra or ödevata I get no results.
I have tried to add: '214' => 'O',
and '197' => 'A',
And I did rebuild the index, but it didn't work. Here are the characters in my foreign_chars file when I haven't added anything:
'223' => "ss", // ß
'224' => "a",
'225' => "a",
'226' => "a",
'229' => "a",
'227' => "ae", // ã
'228' => "ae", // ä
'230' => "ae", // æ
'231' => "c",
'232' => "e", // è
'233' => "e", // é
'234' => "e", // ê
'235' => "e", // ë
'236' => "i",
'237' => "i",
'238' => "i",
'239' => "i",
'241' => "n",
'242' => "o",
'243' => "o",
'244' => "o",
'245' => "o",
'246' => "oe", // ö
'249' => "u",
'250' => "u",
'251' => "u",
'252' => "ue", // ü
'255' => "y",
'257' => "aa",
'269' => "ch",
'275' => "ee",
'291' => "gj",
'299' => "ii",
'311' => "kj",
'316' => "lj",
'326' => "nj",
'353' => "sh",
'363' => "uu",
'382' => "zh",
'256' => "aa",
'268' => "ch",
'274' => "ee",
'290' => "gj",
'298' => "ii",
'310' => "kj",
'315' => "lj",
'325' => "nj",
'352' => "sh",
'362' => "uu",
'381' => "zh",
Low 20 Feb 2013 14:56
Are you sure that's the entire file? The top of the file looks truncated by your paste.
JohanHedin 20 Feb 2013 15:12
It's not the entire file. It is all the characters that's in it however. The first character code is '223'. I removed the PHP part as this commenting system didn't seem to like me posting it.
Low 20 Feb 2013 15:14
Try and change the translations "ae" and "oe" to "a" and "o". Specifically for keys 228 and 246.
JohanHedin 20 Feb 2013 15:27
It worked! Thank you!