16

I am grabbing input from a file with the following code

$jap= str_replace("\n","",addslashes(strtolower(trim(fgets($fh), " \t\n\r"))));

i had also previously tried these while troubleshooting

$jap= str_replace("\n","",addslashes(strtolower(trim(fgets($fh)))));
$jap= addslashes(strtolower(trim(fgets($fh), " \t\n\r")));

and if I echo $jap it looks fine, so later in the code, without any other alterations to $jap it is inserted into the DB, however i noticed a comparison test that checks if this jap is already in the DB returned false when i can plainly see that a seemingly exact same entry of jap is in the DB. So I copy the jap entry that was inserted right from phpmyadmin or from my site where the jap is displayed and paste into a notepad i notice that it paste like this... (this is an exact paste into the below quotes)

"

バスにのって、うみへ行きました"

and obviously i need, it without that white space and breaks or whatever it is.

so as far as I can tell the trim is not doing what it says it will do. or im missing something here. if so what is it?

UPDATE: with regards to Jacks answer

the preg_replace did not help but here is what i did, i used the bin2hex() to determine that the part that "is not the part i want" is efbbbf i did this by taking $jap into str replace and removing the japanese i am expecting to find, and what is left goes into the bin2hex. and the result was the above "efbbbf"

echo bin2hex(str_replace("どちらがあなたの本ですか","",$jap));

output of the above was efbbbf but what is it? can i make a str_replace to remove this somehow?

user1397417
  • 708
  • 4
  • 11
  • 34
  • 2
    What's that `addslashes` for? And please don't say it's for the database. – mario Jun 02 '12 at 01:25
  • 1
    Did you try just `trim($str)`, without the second parameter? As per the [manual](http://php.net/trim), your version won't remove NUL bytes and vertical tabs (whatever a "vertical tab" is...) – bfavaretto Jun 02 '12 at 01:28
  • 1
    possible duplicate of [Trim unicode whitespace in PHP 5.2](http://stackoverflow.com/questions/4166896/trim-unicode-whitespace-in-php-5-2) – mario Jun 02 '12 at 01:28
  • bfavaretto, yes as you see above the 2nd line of code that i tried only has the first parameter. mario, the addslashes it to add slashes mainly just to entries that will contain a "'" because some will have that in my case. – user1397417 Jun 02 '12 at 01:32
  • Did you check if it's a non-breaking space, as the "possible duplicate" link suggests? – bfavaretto Jun 02 '12 at 01:36
  • just tried all the examples in the link mario posted, but i still am getting the same result as i posted. – user1397417 Jun 02 '12 at 01:44

1 Answers1

42

The trim function doesn't know about Unicode white spaces. You could try this:

preg_replace('/^\p{Z}+|\p{Z}+$/u', '', $str);

As taken from: Trim unicode whitespace in PHP 5.2

Otherwise, you can do a bin2hex() to find out what characters are being added at the front.

Update

Your file contains a UTF8 BOM; to remove it:

$f = fopen("file.txt", "r");
$s = fread($f, 3);
if ($s !== "\xef\xbb\xbf") {
    // bom not found, rewind file
    fseek($f, 0, SEEK_SET);
}
// continue reading here
Community
  • 1
  • 1
Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
  • the preg_replace did not help but here is what i did, i used the bin2hex() to determine that the part that "is not the part i want" is efbbbf i did this by taking $jap into str replace and removing the japanese i am expecting to find, and what is left goes into the bin2hex. and the result was the above "efbbbf" echo bin2hex(str_replace("どちらがあなたの本ですか","",$jap)); – user1397417 Jun 02 '12 at 01:58
  • @user1397417 think I found it. your file contains a UTF8 BOM header. Updated my answer. – Ja͢ck Jun 02 '12 at 02:04
  • Your updated Solution looks like it has solved my problem, thanks! I would have click "this answer was useful" for you but it said it requires "15 reputation" sorry – user1397417 Jun 02 '12 at 02:15
  • @user1397417 no worries, I've given you +5 by up voting your question ;-) welcome to SO! – Ja͢ck Jun 02 '12 at 02:19