1

EDIT: Thank you all for answering the question, all the answers work. Stack Overflow indeed has a great community.

Receiving a flat file as a source. In one of the fields, the value is segregated into new lines, but there is a need to break the newline and combine it into a single content.

Ex: File is as below:

PO,MISC,yes,"This
is
an
example"
PO,MISC,yes,"This
is
another
example"

In the above ex, the data is being read as 9 lines, but we need the input to be read as a single line, as shown below -

PO, MISC, yes, "This is an example"
PO, MISC, yes, "This is another example"

Tried via the below syntax but did not succeed. Is there any way to achieve this? I also need to print the file contents into another file.

Syntax:

awk -v RS='([^,]+\\,){4}[^,]+\n' '{gsub(/\n/,"",RT); print RT}' sample_attachments.csv > test.csv
Chirag Raj
  • 35
  • 1
  • 5
  • `tr '\n' ' ' – dawg Sep 19 '22 at 14:03
  • 2
    hi there, thanks for answering my question. there are multiple records in that file. trying this deletes all the new line characters - I need to remove all new line characters within quotation marks. – Chirag Raj Sep 19 '22 at 14:13
  • 1
    If you don't REALLY want a blank after every `,` in your output then don't show a blank after every `,` in your expected output, [edit] your question to fix or explain that. – Ed Morton Sep 19 '22 at 17:28

7 Answers7

3

With your shown samples Only, please try following awk, written and tested in GNU awk. Simple explanation would be, setting RS to \"\n and setting field separator as ,. In main block Globally substituting new lines with spaces in $NF. Then using printf printing current line along with value of RT.

awk -v RS="\"\n"  'BEGIN{FS=OFS=","} {gsub(/\n/," ",$NF);printf("%s",$0 RT)}' Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
2
awk -F"," '
   BEGIN{ getline; n=NF; print}
   { split($0,a,FS); 
     while(length(a)<=n){ 
       s=$0; 
       getline; 
       $0=s " " $0; 
       split($0,a,FS); 
     } 
     print $0 }' sample_attachements.txt
  • BEGIN(....) store the number of fields in the variable n
  • while the number of fields (length of array a) is unequal n, read another line, and append it to the input.
  • print $0 finally print the (modified)input line
Luuk
  • 12,245
  • 5
  • 22
  • 33
2

You may use this gnu-awk solution:

awk -v RS='"[^"]*"' '{ORS = gensub(/\n/, " ", "g", RT)} 1' file

field1, field2, field3, field4
PO,MISC,yes,"This is an example"
PO,MISC,no,"This is  another example"

Where input file is this:

cat file
field1, field2, field3, field4
PO,MISC,yes,"This
is
an
example"
PO,MISC,no,"This
is
another
example"

For the updated question use this awk:

awk -F, -v OFS=", " -v RS='"[^"]*"|\n' '{
   ORS = gensub(/\n(.)/, " \\1", "g", RT)
   $1 = $1
} 1' file

field1,  field2,  field3,  field4
PO, MISC, yes, "This is an example"
PO, MISC, no, "This is  another example"
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Hi There, thanks for answering my question. when I print this into another file, the file shows blank. There are actually no headers in the source file. I used the unix cmd something like this awk -v RS='"[^"]*"' '{ORS = gensub(/\n/, " ", "g", print RT)} 1' /pat/sample_attachment.csv > /path/test.csv – Chirag Raj Sep 19 '22 at 14:28
  • 1
    For me `awk -v RS='"[^"]*"' '{ORS = gensub(/\n/, " ", "g", RT)} 1' file > output` is working fine and showing all the output. – anubhava Sep 19 '22 at 14:43
  • 1
    Here is working demo: https://ideone.com/WnlA3A – anubhava Sep 19 '22 at 14:45
  • Hi anubava, the update logic throws errors in putty. Is that not executable in putty? – Chirag Raj Sep 21 '22 at 02:52
  • putty is just a terminal emulator. What is your awk version and what error are you getting? – anubhava Sep 21 '22 at 02:55
  • I have put the cmd in a shell script and tried to execute the same to write it into another file. But the file has 0kb data. This is the cmd i have added awk -v RS='"[^"]*"' '{ORS = gensub(/\n/, " ", "g", RT)} 1' random_path/sample_file > test.csv The test.csv file is empty. – Chirag Raj Sep 21 '22 at 06:44
  • I asked you to check awk version using `awk --version` command. – anubhava Sep 21 '22 at 06:49
  • GNU Awk 4.0.2 is the AWK version – Chirag Raj Sep 21 '22 at 07:46
  • 1
    Anubhava, thanks for your input. There was a miss at my end , and now it works fine. – Chirag Raj Sep 21 '22 at 08:56
  • But just out of curiosity which awk you ended up using finally? I thought [this one](https://stackoverflow.com/a/73775888/548225) worked for you. – anubhava Sep 21 '22 at 11:15
2

I would harness GNU AWK for this task following way, let file.txt content be

field1, field2, field3, field4
PO,MISC,yes,"This
is
an
example"

then

awk 'BEGIN{RS="";FPAT=".";OFS=""}{for(i=1;i<=NF;i+=1){cnt+=($i=="\"");if($i=="\n"&&cnt%2){$i=" "}};print}' file.txt

gives output

field1, field2, field3, field4
PO,MISC,yes,"This is an example"

Assumptions: there is never more than 1 newline in succession, " are never nested, Explanation: I inform GNU AWK to enter paragraph more, that is treat everything between blank lines as one row and that field pattern is ., i.e. every character is field and that output field separator is empty string. Then I iterate over characters, if I encounter " I increase cnt by 1, which is used for dead-reckoning if I am outside "..." or inside "...", when I encounter newline character and cnt is odd I am inside so I swap that for space character. After all character are processed I print them.

(tested in gawk 4.2.1)

Daweo
  • 31,313
  • 3
  • 12
  • 25
2

Using any awk:

$ awk -v RS='"' -v ORS= '!(NR%2){gsub(/\n/,OFS); $0="\"" $0 "\""} 1' file
PO,MISC,yes,"This is an example"
PO,MISC,yes,"This is another example"

For anything else, see whats-the-most-robust-way-to-efficiently-parse-csv-using-awk.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

Your input file name is assumed to be "file" here, and output is "newfile."

#!/bin/sh -x

cp file stack
cat > ed1 <<EOF
1,4w f1
1,4d
wq
EOF

next () {
[[ -s stack ]] && main
end
}

main () {
ed -s stack < ed1
cat f1 | tr '\n' ' ' >> newfile
next
}

end () {
rm -v ./ed1
rm -v ./f1
rm -v ./stack
}

next
petrus4
  • 616
  • 4
  • 7
1

If sed is allowed, then

sed ':a
     /^[^"]*\("[^"]*"[^"]*\)*$/b
     N
     s/\n/ /
     ba
' file
M. Nejat Aydin
  • 9,597
  • 1
  • 7
  • 17