0

I'm trying to manipulate a dataset with sed so I can do it in a batch because the datasets have the same structure.

I've a dataset with two rows (first line in this example is the 7th row) like this:

Enginenumber; ABX 105;Productionnumber.;01 2345 67-
"",,8-9012

What I want:

Enginenumber; ABX 105;Productionnumber.;01 2345 67-8-9012

So the numbers (8-9012) in the second line have been added at the end of the first line because those numbers belong to each other

What I've tried:

sed '8s/7s/' file.csv

But that one does not work and I think that one will just replace whole row 7. The 8-9012 part is on row 8 of the file and I want that part added to row 7. Any ideas and is this possible?

Donald
  • 145
  • 2
  • 4
  • 14

2 Answers2

1

Note: In the question's current form, a sed solution is feasible - this was not the case originally, where the last ;-separated field of the joined lines needed transforming as a whole, which prompted the awk solution below.

Joining lines 7 and 8 as-is, merely by removing the line break between them, can be achieved with this simple sed command:

sed '7 { N; s/\n//; }' file.csv

awk solution:

awk '
 BEGIN { FS = OFS = ";" }
 NR==7 { r = $0; getline; sub(/^"",,/, ""); $0 = r $0 }
 1
' file.csv

Judging by the OP's comments, an additional problem is the presence of CRLF line endings in the input. With GNU Awk or Mawk, adding RS = "\r\n" to the BEGIN block is sufficient to deal with this (or RS = ORS = "\r\n", if the output should have CRLF line endings too), but with BSD Awk, which only supports single-character input record separators, more work is needed.

  • BEGIN { FS = OFS = ";" } tells Awk to split the input lines into fields by ; and to also use ; on output (when rebuilding the line).

  • Pattern NR==7 matches input line 7, and executes the associated action ({...}) with it.

  • r = $0; getline stores line 7 ($0 contains the input line at hand) in variable r, then reads the next line (getline), at which point $0 contains line 8.

  • sub(/^"",,/, "") then removes substring "",, from the start of line 8, leaving just 8-9012.

  • $0 = r $0 joins line 7 and modified line 8, and by assigning the concatenation back to $0, the string assigned is split into fields by ; anew, and the resulting fields are joined to form the new $0, separated by OFS, the output field separator.

  • Pattern 1 is a common shorthand that simply prints the (possibly modified) record at hand.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Hey thanks for your reply! It almost works! It also adds XXXXXXX. It is also okay if it just gets added to the 7th row and the spaces and the '-' character remains. I can fix that with sed. – Donald Jan 12 '17 at 13:56
  • Thanks for your comprehensive answer! I've updated my question and I hope it is more clear. The numbers in the second should be added at the end of the first row because those numbers belong to each other. – Donald Jan 12 '17 at 14:45
  • 1
    I've tried your new awk code but the result is that 8-9012 replaces enginenumber. Now I see '8-9012umber; ABX 105;Productionnumber.;01 2345 67-' – Donald Jan 12 '17 at 15:03
  • 1
    @Donald: It sounds like you have Windows-style CRLF line breaks in your input. – mklement0 Jan 12 '17 at 15:13
  • Thanks for your effort. Unfortunately it does not do the trick yet. Line 8 is still inserted at the beginning of line 7 instead of the end. Basically I just want to join line 7 and 8. Like concat in SQL. – Donald Jan 12 '17 at 15:35
  • If you need help with the CRLF problem, see [this question](http://stackoverflow.com/q/21640902/45375). If you want to verify that your input indeed has CRLF line endings, run `cat -e file.csv`: if you see `^M` at the end of the lines (before the `$`), you have CRLF line endings. – mklement0 Jan 12 '17 at 15:46
  • I have `^M$` Does that mean I don't have CRLF? – Donald Jan 12 '17 at 16:09
  • It means that you DO have them, so you either need to remove the CRs up front as in the answers to the linked question, or to do it as part of your Sed or Awk command (as I've described in my answer). – mklement0 Jan 12 '17 at 16:17
1

With sed:

sed '/^[^"]/{N;s/\n.*,//;}' file
  • /^[^"]/: search for lines not starting with ", and if found:
  • N: next line is appended to the pattern space
  • s/\n.*,//: all characters up to last , are removed from second line
SLePort
  • 15,211
  • 3
  • 34
  • 44
  • Consider `/pattern/!{N}` using the bang for not-matching – stevesliva Jan 12 '17 at 15:09
  • @stevesliva `sed '/^"/!{N;s/\n.*,//}'` is not equivalent to `sed '/^[^"]/N;s/\n.*,//'`. – SLePort Jan 12 '17 at 15:21
  • I meant `/"/!{N};s/\n.*,//` I am unsure if the braces are needed. I just happened to see you working on this answer and when `/[^"]/` appeared, `/"/!` was my thought. +1 regardless – stevesliva Jan 12 '17 at 15:52