I am trying to figure out the regex expression and using PHP to fix a csv file. The format of the csv file should be like this:
id, text field1, text field2
1,some text,another text
2,some text,another text
3,some text,another text
Problem is, sometimes, the "text field1" has a new line character in it, and the content of the csv files becomes like this:
id, text field1, text field2
1,some text,another text
2,some
text,another text
3,some text,another text
As a result, when reading the file line by line, or exploding using new line to fill an array line by line, I get invalid records. I fixed the above problem using this code:
<?php
$c= file_get_contents($myFile);
$c= preg_replace( '/\n([^0-9])/is', "\n~~$1", $c );
$c= str_replace( "\n~~", " ", $c );
?>
The above regex checks if character after a newline is not numeric, then it adds a ~~
sign there, and I replace the newline and ~~
sign with a white space.
Problem is, sometimes, the "text field1" has numbers in it, and newline appears just before them, like this:
id, text field1, text field2
1,some text,another text
2,some
1999-06-21 text,another text
3,some text,another text
My regex is not putting a ~~
sign in this case and I get a broken csv file.
How can I modify this above regex to fix this problem? I was hoping for a regex where it can check for "non numeric value (any length) followed by comma sign". Please note that I got over 100000 records in csv file, so if using regex method, then regex must check for non-numeric value of any length.
Something like this:
$c= preg_replace( '/\n([^0-9]*\,)/is', "\n~~$1", $c);
But this above regex does not work, and I don't know that much regex to get this working.
If there is any other way of fixing this csv file, even that is ok. It need not be a regex solution.
Thanks