html_entity_decode() error after importing XML w/ Datagrab and Low Search
After importing an XML file using Datagrab, I'm receiving two PHP errors for each entry imported (over 1000 in this case).
A PHP Error was encountered
Severity: Warning
Message: html_entity_decode(): charset 'ANSI_X3.4-1968' not supported, assuming iso-8859-1
Filename: helpers/low_search_helper.php
Line Number: 236
A PHP Error was encountered
Severity: Warning
Message: html_entity_decode(): charset 'ANSI_X3.4-1968' not supported, assuming iso-8859-1
Filename: helpers/low_search_helper.php
Line Number: 166
I have EE 2.5.5, Datagrab 1.7.7 and Low Search 2.3.1 installed on a LAMP stack. I'm not sure if this is a Datagrab issue or Low Search. Do you know of a fix for this issue? Any other information I can provide?
Replies
Low 7 May 2013 22:00
Hmm, must be an encoding issue. Is the data UTF-8 encoded? And is your DB UTF-8 as well?
bwlng 7 May 2013 22:47
My DB is UTF-8, the XML file is ISO-8859-1. When I change the first line in the XML file to
<?xml version="1.0" encoding="UTF-8"?>
, the import completely fails.Is there anything we can do outside of a new XML format?
Low 8 May 2013 06:50
You can try and hard-code the latin1 charset for the import.
Open up helpers/low_search_helper.php and change this:
to this:
on lines 166 and 236. Then change it back once the import is done.
bwlng 8 May 2013 14:53
Thanks Low, that worked perfectly. The only remaining issue is that we need to run the import nightly. Is there a more permanent workaround or is the underlying issue more of a Datagrab or XML problem?
Low 8 May 2013 15:17
Well, EE currently relies on the content being UTF-8, so Low Search does, too. In order to come up with a more permanent solution, I'd need to know exactly where the problem lies. And that's most likely the source material; which isn't UTF-8, but wants to be put in a UTF-8 Database.
If you can somehow generate a valid UTF-8 XML file, or Datagrab can convert it to it, that will probably help.
bwlng 8 May 2013 16:47
I tried converting the XML file to UTF-8 using xmllint, which seemed to produce a valid file, but I continued to get the original error from low_search_helper when importing it through Datagrab after reverting the changes to lines 166 and 236.
I'll reach out to Brand New Box and see if there is anything on the Datagrab side of things that might help.
Thanks for your guidance!