0

I have a big CSV file that contains paragraphs like this:

first line1  
second line1  
third line1  
fourth line1  
first line2  
second line2  
third line2  
fourth line2

After processing I would like to be translated into this:

first line1,second line1,third line1,fourth line1  
first line2,second line2,third line2,fourth line2

Note: first line, second line, etc contains special characters like . , " :

I'm thinking that an option could be to find "second" word from second line1 and replace the "enter" before it with a comma, this way the second line1 will be on the right side of the first line1.

How can I do this?

Indeed, it's very possible that above example is not the real ACTUAL data, here it is:

137822118,user,User,192.168.100.20,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ABCD_BD,Succeeded,"NE Name:B12345-BXL_ABCD_BD  
MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.20"";  
MML Result:Successful.  
",2016-07-25 23:19:05 DST  
137821234,user,User,192.168.100.21,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ASDF_BD,Succeeded,"NE Name:B12345-BXL_ASDF_BD  
MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.21"";  
MML Result:Successful.  
",2016-07-25 22:18:05 DST

The CSV file contains a lot of this kind of paragraphs.

The output should be (one paragraph in one line):

137822118,user,User,192.168.100.20,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ABCD_BD,Succeeded,"NE Name:B12345-BXL_ABCD_BD,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.20""; MML Result:Successful.  ",2016-07-25 23:19:05 DST    
137821234,user,User,192.168.100.21,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ASDF_BD,Succeeded,"NE Name:B12345-BXL_ASDF_BD,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.21""; MML Result:Successful.  ",2016-07-25 22:18:05 DST

Thanks a lot for your help!

I tried your solutions and it almost works, but is not the expected result. It seems that the example I give you is slight different from the source file because of the posting format here.

Please find below the real source CSV file (only a couple of lines as the complete file contains more than one million)

https://www.wetransfer.com/downloads/637b36b2148550ad090c22c9e8297a9c20160804081835/48b90b

Sorry for the misunderstanding and thank again!

Bogdan
  • 19
  • 2
  • Are there always 4 lines to a set? Please show the ACTUAL data so we can see placement of special characters, etc. – Gary_W Aug 03 '16 at 13:50
  • This is a duplicate of http://stackoverflow.com/q/38731863/1745001 but I can't close it as a dup since the answer there hasn't been upvoted or accepted yet. – Ed Morton Aug 03 '16 at 15:01

5 Answers5

3

another alternative

$ awk '{ORS=NR%4?",":RS}1' file

reset the output record separator every four lines and print.

karakfa
  • 66,216
  • 7
  • 41
  • 56
1

You can use paste for this, for example:

$ paste -d, - - - - < file
first line1,second line1,third line1,fourth line1
first line2,second line2,third line2,fourth line2

- means standard input, when you specify N of them (N=4 in this example), paste will form a line from N lines from standard input.

-d is to specify the column separator, in this example a comma.

janos
  • 120,954
  • 29
  • 226
  • 236
0

try this;

 awk -v patt="first" 'BEGIN{ORS=","}$0 ~ patt {gsub(patt, "\n"patt)}1'  CSVfile
Mustafa DOGRU
  • 3,994
  • 1
  • 16
  • 24
0
$ awk 'NR==1 {prev=$0; next} {printf "%s", prev; printf "%s", $0~/^[0-9]{9}/ ?"\n":","; prev=$0} END{print prev}' test.in
137822118,user,User,192.168.100.20,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ABCD_BD,Succeeded,"NE Name:B12345-BXL_ABCD_BD  ,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.20"";  ,MML Result:Successful.  ,",2016-07-25 23:19:05 DST
137821234,user,User,192.168.100.21,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ASDF_BD,Succeeded,"NE Name:B12345-BXL_ASDF_BD  ,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.21"";  ,MML Result:Successful.  ,",2016-07-25 22:18:05 DST

When a new record starts with a bunch of numbers, it's time to newline. Next time, please post the correct data right in the beginning.

James Brown
  • 36,089
  • 7
  • 43
  • 59
0

Using GNU awk for FPAT and making no assumptions about how many lines or fields your input has or what field values occur at the start/end of records:

$ cat decsv.awk
BEGIN { FPAT = "([^,]*)|(\"[^\"]+\")"; OFS="," }
{
    # create strings that cannot exist in the input to map escaped quotes to
    gsub(/a/,"aA")
    gsub(/\\"/,"aB")
    gsub(/""/,"aC")

    # prepend previous incomplete record segment if any
    $0 = prev $0
    numq = gsub(/"/,"&")
    if ( numq % 2 ) {
        # this is inside double quotes so incomplete record
        prev = $0 OFS
        #prev = $0 RT   # uncomment to retain newlines in the record
        next
    }
    prev = ""

    for (i=1;i<=NF;i++) {
        # map the replacement strings back to their original values
        gsub(/aC/,"\"\"",$i)
        gsub(/aB/,"\\\"",$i)
        gsub(/aA/,"a",$i)
    }

    print

    #printf "Record %d:\n", ++recNr
    #for (i=0;i<=NF;i++) {
        #printf "\t$%d=<%s>\n", i, $i
    #}
    #print "#######"
}

.

$ awk -f decsv.awk file
137822118,user,User,192.168.100.20,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ABCD_BD,Succeeded,"NE Name:B12345-BXL_ABCD_BD  ,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.20"";  ,MML Result:Successful.  ,",2016-07-25 23:19:05 DST
137821234,user,User,192.168.100.21,2016-07-25 23:19:05 DST,iScript,iScript send MML command,B12345-BXL_ASDF_BD,Succeeded,"NE Name:B12345-BXL_ASDF_BD  ,MML Command:LST DEVIP:OPONEMS=""user"", IPOFEMSWS=""192.168.100.21"";  ,MML Result:Successful.  ,",2016-07-25 22:18:05 DST

See Awk to get .csv column containing commas and newlines for more info.

Community
  • 1
  • 1
Ed Morton
  • 188,023
  • 17
  • 78
  • 185