Search for string inside "word"
Hi Low
I'm not sure if what I need now is possible, or whether I'm just missing something. I need to search for a string *inside* what would normally be considered a word, but really isn't.
Let me explain. I have to search LS collections that are written in Thai, which does not actually have spaces between words but only between sentences. So a thai "word" is actually a string inside a longer string that includes other words, and what might actually be defined as a word (namely, a string with a space at either end) is actually a sentence. Does that make sense?
For example, this is the word for a city called Chiang Saen: เชียงแสน
And the sentence "this land is 10km from Chiang Saen" is written as:
ที่ดินห่างจากอําเภอเชียงแสนเพียง10กม
If you're still with me, the trouble I'm having in LS is that I can't actually get LS to return that word/substring. The "loose_ends" option doesn't seem to work, but the explanation in the documentation is that it only works if the search term is at the beginning of a string, not at any possible point within it. Is that right?
If that all makes sense, do you have any ideas on how I might be able to get this kind of search working?
Cheers
Andrew
Replies
Low 24 Jul 2012 06:28
Hi Andrew,
I don't really have a solution for this. I've thought about it -- Chinese or Japanese is the same. But MySQL's fulltext index (which is the whole foundation of the add-on) is based on *words*, and words need separators to differentiate them. That's why LS currently only supports words or partial words (foo*), but not substrings.
As I'm typing this, I thought of one thing that might work, although it's not implemented in LS right now. LS can also perform a fallback search when the search terms contain fulltext stop words or if a word is smaller than the minimum word length in the DB. It might be possible to add another option to force the fallback search (which is a regular LIKE query). I'd have to think about it for a while... Is there a scenario where, in Thai, you'd enter multiple words with spaces, or will they always enter a "space-less" term?
Andrew Mac 24 Jul 2012 07:24
Hi Low
Thanks once again for the really quick reply, although it's a pity that this looks like it's going to be a bit of an issue for LS ... and the answer to your question is that the scenario for searching in Thai would always be a "space-less" term.
So I think that for this site I might just have to switch over to Super Search, which works okay with the Thai way of doing things. If you come up with a solid solution then I'd certainly look at moving back, but I wouldn't make it a priority. If it's already been on your mind, then it's probably best left there until you've given it all the thought you need to make sure that any possible solution would be the best one available, and fits your plans for LS. For now I'll be okay with Super Search :-}
Cheers
Andrew
Low 24 Jul 2012 07:42
If you have an hour or 2, I can send you a copy to try with the above solution implemented. Looking at it now, and it should be fairly straight forward.
Andrew Mac 24 Jul 2012 07:56
It must be said, you're very very quick ;-)
I'd be happy to take a look at anything you send
I'll pop an email over to you @gotolow ...
Cheers
Andrew
TheJae 4 Sep 2012 08:49
Hello Low,
do you mind sending me a copy? am having the same problem with Chinese words..
James
Low 4 Sep 2012 08:51
Hi James,
The latest version has this implemented. Use loose_ends="both" for substring searches, which should work better with Chinese.
TheJae 4 Sep 2012 08:56
Hi Low,
Thanks for your prompt reply. I just tried it but it's not working, added loose_ends="both" to my search field and the search result page.
Will Loose_ends search through entry Title?
James
Low 4 Sep 2012 08:58
Do you mind creating a new thread for this issue? Probably need to debug more thoroughly.
TheJae 4 Sep 2012 09:10
Done! https://getsatisfaction.com/low/topic...