Today i had a little problem removing a Byte-Order-Mark (BOM) from a UTF-8 encoded csv-file. The reason for the BOM character was a stupid application written in Microsoft Visual Basic, which is not able to do an simple export of a data spreadsheet without a byte-order-mark.
I would like to use the csv-file which stores employee information of my company to create an sortable html table with PHP5. So before the file can read into a string variable i had to remove the byte-order-mark. For this little task i wrote a small function:
1 2 3 4 5 6 | function rmBOM($string) { if(substr($string, 0,3) == pack("CCC",0xef,0xbb,0xbf)) { $string=substr($string, 3); } return $string; } |
The 2nd parameter in the pack() function is the hexadecimal representation of the BOM in a UFT-8 encoded file. To simple cut out the BOM character i read the file into a string, remove the byte-order-mark and wirte the string back to the file:
$string = file_get_contents('/full/path/to/utf8-file.csv'); $string = rmBOM($string); file_put_contents('/full/path/to/utf8-file.csv', $string);
PHP6 should come with unicode support and will handle UTF-8 encoded files with BOM correct (PHP Bug #22108).