Unable to remove carriage returns and line feeds in columns enclosed in double quotes

Question

I want to remove any non printable new line characters in the column data.

I have enclosed all the columns with double quotes to delete the new line characters present in the column easily and to ignore the record delimiter after each end of line.

Say,I have 4 columns seperated by comma and enclosed by quotes in a text file. I'm trying to remove \n and \r characters only if it is present in between the double quotes

Currently used trim,but it deleted every line break and made it a sequence file without any record seperator.

tr -d '\n\r' < in.txt > out.txt

Sample data:

"1","test\n

Sample","data","col4"\n

"2\n

","Test","Sample","data" \n

"3","Sam\n

ple","te\n

st","data"\n

Expected Output:

"1","testSample","data","col4"\n

"2","Test","Sample","data" \n

"3","Sample","test","data"\n

Any suggestions ? Thanks in advance

That problem description makes no sense. Why does your "expected output" add a comma between `"3"` and `"Sample"`? What are those `\n` things? Are there literal backslashes in your data? — melpomene, Sep 23 '17 at 11:05
Hi,comma between is just a typo.Please ignore. And '\n' is just to say end of a line and must not remove the \n character which is at the end of all columns. — lfc_07, Sep 23 '17 at 11:08
@melpomene Only if the new line characters are present in the column data, it should be removed.Which is nothing but between double quotes (") — lfc_07, Sep 23 '17 at 11:10
Your problem description doesn't match your sample input. None of the fields in your sample input contain embedded newlines. — melpomene, Sep 23 '17 at 11:26
Those new lines are special characters(say control characters) which will not be visible unless you see in a vi editor.I was not sure how to indicate that here. — lfc_07, Sep 23 '17 at 11:28
Ifc_07: please use code tags for your sample Input_file and expected output too, as it is very difficult for us to know the exact sample Input_file and expected output if you haven't used code tags. Also take sometime to write your question which will save all of ours time too. — RavinderSingh13, Sep 23 '17 at 11:32
@melpomene Explained it in a better way i guess.Please do check — lfc_07, Sep 23 '17 at 11:56

score 0 · Answer 1 · answered Sep 23 '17 at 11:22

0

With GNU sed

sed ':a;N;$!ba;s/\("[^\n\r]*\)[\n\r\]*\([^\n\r]*\"\)/\1\2/g' file

See this post for the newline replacement without the enclosing ".

answered Sep 23 '17 at 11:22

LeBlue

595
7
11

RavinderSingh13 · Answer 2 · 2017-09-23T12:01:42.490

Could you please try awk solution and let me know if this helps you.

awk '{gsub(/\r/,"");printf("%s%s",$0,$0~/,$/?"":RS)}'  Input_file

Output will be as follows.

"1","test","Sample","data"\n
"2","Test" \n
"3","Sample"

Explanation: Using printf to print the lines, so using 2 %s(it is used for printing strings in printf) here, first %s simply prints the current line, second one will check if a line is ending with comma(,) if yes then it will not print anything else it will print a new line. Add gsub(/\r/,"") before printf in case you want to remove carriage returns and want to get the expected output shown by you too.

EDIT: As your post title suggests to remove carriage returns, so in case you want to remove carriage returns then you could try following. Though you should be mentioning your problem clearly.

tr -d '\r' < Input_file > temp_file && mv temp_file  Input_file

Above will remove the carriage characters from your Input_file and save it in the same Input_file too.

You're right.But only if it exists between double quotes. Please check my edited question now:) — lfc_07, Sep 23 '17 at 11:57
Could you please try my awk solution with adding gsub(it will take care of all carriage returns not only specific ones, unless you want to keep carriage returns then we have to think on another option) which I mentioned in comments and let me know then, it should help you on same. Let me know how it goes then. — RavinderSingh13, Sep 23 '17 at 12:01

score 0 · Answer 3 · answered Sep 23 '17 at 12:03

Here's a possible solution:

perl -pe 'if (tr/"// % 2) { chomp; $_ .= <>; redo; }'

If the current line has unbalanced quotes (i.e. an odd number of "), it must end in the middle of a field, so we chomp out the newline, append the next input line, and restart the loop.

Unable to remove carriage returns and line feeds in columns enclosed in double quotes

3 Answers3

Linked