I have a CSV file where I need to replace the occurrence of a double quote followed by a line feed with a string i.e. "XXXX"
I've tried the following:
LC_CTYPE=C && LANG=C && sed 's/\"\n/XXXX/g' < input_file.csv > output_file.csv
and
LC_CTYPE=C && LANG=C && sed 's/\"\n\r/XXXX/g' < input_file.csv > output_file.csv
also tried
sed 's/\"\n\r/XXXX/g' < input_file.csv > output_file.csv
In each case, the command does not seem to recognize the specific combination of "\n in the file
It works if I look for just the double quote:
sed 's/\"/XXXX/g' < input_file.csv > output_file.csv
and if I look for just the line feed:
sed 's/\n\r/XXXX/g' < input_file.csv > output_file.csv
But no luck with the find-replace for the combined regex string
Any guidance would be most appreciated.
Adding simplified sample data
Sample input data (header row and two example records):
column1,column2
data,data<cr>
data,data"<cr>
Sample output:
column1,column2
data,data<cr>
data,dataXXXX
Update: Having some luck using perl commands in bash (MacOS) to get this done:
perl -pe 's/\"/XXXX/' input.csv > output1.csv
then
perl -pe 's/\n/YYYY/' output1.csv > output2.csv
this results in XXXXYYYY at the end of each record
I'm sure there is an easier way, but this seems to be doing the trick on a test file I've been using. Trying it out there before I use on the original 200K-line csv file.