0

Using PHP to do a file_get_contents on a *.txt file. Then inserting the data into MySQL. Values are inserting null. The null is caused by a non breaking space and a replacement charcter from an excel export. I figured that by copying the characters from the text file into a unicode inspector .
Did the same with the replacement character. Copied the text and pasted it here to confirm.

Tried many str_replace and preg_replace but no luck. Tried nearly everything on this SO question and found this worked.

$some_text_with_non_breaking_spaces = "Christ  O'Connory";
$clean_text = hex2bin(str_replace('c2a0', '20', bin2hex($some_text_with_non_breaking_spaces)));
echo $clean_text;

BUT it didn't when I put it inline with the file_get_contents() method.

Any idea how to resolve the null value with preg_replace, str_replace or other methods?

Here's all the versions I've tried:

$name = str_replace('\A0\00', ' ', $nbsp);
$name = str_replace('c2a0', '20', $nbsp);
$name = str_replace('\xc2\xa0', ' ', $nbsp);
$name = str_replace('~\xc2\xa0~', ' ', $nbsp);
$name = str_replace('\xC2\xA0', ' ',$nbsp);
$name = str_replace(' ', ' ',$nbsp);
$name = hex2bin(str_replace('c2a0', '20', bin2hex($nbsp)));  // this did work but not when putting inline with original code.
$name = preg_replace('#[A-Za-z\,\.\'\-\_]#', ' ', $nbsp);
$name = preg_replace('\x{00a0}', ' ', $nbsp);
$name = preg_replace('~\x00\xa0~', ' ', $nbsp);
$name = preg_replace('~\xc2\xa0~', ' ', $nbsp);
$name = preg_replace('\s\s+', ' ', $nbsp);
$name = preg_replace('/\s+/', ' ',  $nbsp);
$name = preg_replace('~\x{c2a0}~siu', ' ',  $nbsp);
$name = preg_replace('/\s/u', ' ',  $nbsp);
$name = preg_replace('/[^\w\d\p{L}]/u', ' ',$nbsp);

Here is a snippet of data from the file I was attempting to do a file_get_contents on.

SupervisorGivenName SupervisorSurName   row_date    logid   item_name   acdcalls    AHT AvgHoldTime transferred CntOBCalls  calls
        Ders    Schmid  09/02/2015  5054589 Christ  O'Connory   26  420 112 4   0   0
        Nic Flemg   09/02/2015  5054596 Mica  Wit   28  543 32  6   0   0



    Insert statement:

        $bb_query = "INSERT INTO `tier1_bb_agent_daily` (`date`,`loginID`,`empID`,`firstname`,`lastname`,`supID`, `supName`,`acd_calls`,`paetec_acd_calls`,`aht`,`avg_hold_time`,`transferred`,`outbound_call_count`)
                        VALUES ('{$row['date']}','{$row['loginID']}','{$empID}','{$firstname}','{$lastname}','{$supid}','{$newSupName}',{$row['acd_calls']},{$row['paetec_acd_calls']},{$row['aht']},{$row['avg_hold_time']},{$row['transferred']},{$row['outbound_call_count']})
                        ON DUPLICATE KEY UPDATE firstname = '{$firstname}', lastname = '{$lastname}',empID = '{$empID}', supID = '{$supid}', supName = '{$newSupName}',acd_calls = {$row['acd_calls']}, aht = {$row['aht']}, paetec_acd_calls = {$row['paetec_acd_calls']}, avg_hold_time = {$row['avg_hold_time']}, transferred = {$row['transferred']}, outbound_call_count = {$row['outbound_call_count']}";
                $db->query($bb_query);
Community
  • 1
  • 1
MattJamison
  • 311
  • 3
  • 11
  • 5
    and `$nbsp` is defined as? plus file content is? – Funk Forty Niner Oct 06 '15 at 20:18
  • @Fred-ii- it's a non-breaking space – MattJamison Oct 06 '15 at 20:32
  • @MattJamison the question you linked has an answer which uses `~\x{00a0}~siu` while in your version you didn't add the modifiers. Working with the `u` modifier is crucial when working with unicode. – HamZa Oct 06 '15 at 20:35
  • @HamZa that didn't work either. – MattJamison Oct 06 '15 at 20:40
  • You are using **$nbsp** and **$name** (two variables); If you want to replace it in the string, you have to use the same variable name for the next replace. Hope that helps. @OP: did check printing the result just after file_get_contents and just before inserting itt into DB? – Pixel Maker Oct 06 '15 at 20:46
  • Does anybody understand this question? I feel very stupid... Explain me please what is the problem OP faced with? and what is his expected result? The question looks like a set of words with no sense to me – Alex Oct 06 '15 at 20:46
  • @Alex: this is how I understood, OP is getting some text from some text file and inserting it into a database, record getting inserted but with null. – Pixel Maker Oct 06 '15 at 20:49
  • Is it double encoded? `$name = str_replace(' ', ' ',$nbsp);` – chris85 Oct 06 '15 at 20:50
  • @PixelMaker but where is his query then? where is table structure? where is the php code that realize that functionality? – Alex Oct 06 '15 at 20:50
  • @MattJamison: did you check printing the result after file_get_contents and just before inserting it into DB? – Pixel Maker Oct 06 '15 at 20:53
  • No one can answer the question without knowing what `$nbsp` is. – chris85 Oct 06 '15 at 20:54
  • @chris85 according to `mysql` tag, I would try to answer without knowing `$nbsp` because the problem is some `null` appeared somewhere on mysql side ;-) :-) – Alex Oct 06 '15 at 20:57
  • @PixelMaker I did not check right after file_get_contents. I will though. Just found out the nbsp came from an Excel sheet. The data provider is going to clean it up so I don't have to code around it. So not sure I'll ever get the answer. – MattJamison Oct 07 '15 at 02:08
  • @MattJamison you never provided a reproducible example of the issue here so we won't be able to help you... – chris85 Oct 07 '15 at 04:27
  • @chris85 I would have but I had no where to upload the text file that I was doing a file_get_contents. :( for the down vote. – MattJamison Oct 07 '15 at 11:51
  • Okay so that sample bit is `$nbsp`? Can you also include the code you're inserting with? – chris85 Oct 07 '15 at 13:05
  • @chris85 add the insert statement. – MattJamison Oct 07 '15 at 18:47
  • You need to use prepared statements. As your sample data shows `O'Connory` will break your query because the `'` ends the SQL string encapsulation. `Connory` and all other string values are all offset from being quoted. – chris85 Oct 07 '15 at 21:03
  • @chris85 I'm cleaning the data before insert to account for apostrophe's hyphens etc. before inserting. I just left that code out of the question. – MattJamison Oct 08 '15 at 01:02
  • It is very difficult to know what you are doing.. – chris85 Oct 08 '15 at 01:16
  • `UPDATE tbl SET col = REPLACE(col, ' ', ' ');` should handle all cases in one column of one table. – Rick James Oct 26 '15 at 02:39

2 Answers2

4

Why most of your attempts failed

$name = str_replace('\A0\00', ' ', $nbsp);
$name = str_replace('c2a0', '20', $nbsp);

Wrong escape sequences.

$name = str_replace('~\xc2\xa0~', ' ', $nbsp);

The delimiters are needed for regexes, not for simple string replacement.

$name = str_replace('\xc2\xa0', ' ', $nbsp);
$name = str_replace('\xC2\xA0', ' ',$nbsp);

Correct escape sequences, but you need double-quoted strings for escape sequences to work.

$name = str_replace(' ', ' ',$nbsp);

Only works for HTML entities.

$name = preg_replace('#[A-Za-z\,\.\'\-\_]#', ' ', $nbsp);

Why would you want to replace A-Z with space?

$name = preg_replace('\x{00a0}', ' ', $nbsp);

Missing delimiters and Unicode modifier.

$name = preg_replace('~\x00\xa0~', ' ', $nbsp);

Tries to match NUL characters, missing Unicode modifier.

$name = preg_replace('~\xc2\xa0~', ' ', $nbsp);

This one should have worked for UTF-8. It's equivalent to the bin2hex hack.

$name = preg_replace('\s\s+', ' ', $nbsp);

Missing regex delimiters.

$name = preg_replace('/\s+/', ' ',  $nbsp);

Missing Unicode modifier.

$name = preg_replace('~\x{c2a0}~siu', ' ',  $nbsp);

Wrong escape sequence.

$name = preg_replace('/\s/u', ' ',  $nbsp);

This one should work but replaces every whitespace character with space.

$name = preg_replace('/[^\w\d\p{L}]/u', ' ',$nbsp);

Should work but also replaces punctuation with space.

How to replace non-breaking space with normal space

If your input is encoded as UTF-8 (which it probably is if the bin2hex hack worked):

$result = str_replace("\xC2\xA0", ' ', $src); # or
$result = preg_replace('/\xC2\xA0/', ' ', $src); # or
$result = preg_replace('/\xA0/u', ' ', $src);

If your input is encoded as ISO-8859-1:

$result = str_replace("\xA0", ' ', $src); # or
$result = preg_replace('/\xA0/', ' ', $src);

The str_replace versions are preferred for performance reasons.

nwellnhof
  • 32,319
  • 7
  • 89
  • 113
-2

Just debug your code print the file contents after file_get_contents

$file_contents** = file_get_contents(......);
echo 'File Contents '."\n\n";
print_r($file_contents);
$modified = str_replace('\A0\00', ' ', $file_contents);
$modified = str_replace('c2a0', '20', $modified);
$modified = str_replace('\xc2\xa0', ' ', **$modified);

..................
echo 'Before DB'."\n\n";
print_r($modified);
//your insert here
Pixel Maker
  • 119
  • 6