84

I am trying to scrub some lists into a properly formatted CSV file for database import.

My starting file, looks something like this with what is supposed to be each "line" spanning multiple lines like below

Mr. John Doe
Exclusively Stuff, 186 
Caravelle Drive, Ponte Vedra
33487. 

I created a sed script that cleans up the file (there's lots of "dirty" formatting like double spaces and spaces before/after commas). The problem is the Zip with the period. I would like to change that period for a new line, but I cannot get it to work.

The command that I use is:

sed -E -f scrub.sed test.txt

and the scrub.sed script is as follows:

:a
N
s|[[:space:]][[:space:]]| |g
s|,[[:space:]]|,|g
s|[[:space:]],|,|g
s|\n| |g
s|[[:space:]]([0-9]{5})\.|,FL,\1\n |g
$!ba

What I get is

Mr. John Doe,Exclusively Stuff,186 Caravelle Drive,Ponte Vedra,FL,33487n 

If figured that the Zip+.(period) would be a great "delimiter" to use the substitution on and while I can find it, I can't seem to tell it to put a newline there.

Most of the things I found online are about replacing the newline with something else (usually deleting them), but not much on replacing with a newline. I did find this, but it didn't work: How to insert newline character after comma in `),(` with sed?

Is there something I am missing?

Update:

I edited my scrub.sed file putting the literal new line as instucted. It still doesn't work

:a
N
s|[[:space:]][[:space:]]| |g
s|,[[:space:]]|,|g
s|[[:space:]],|,|g
s|\n| |g
s|[[:space:]]([0-9]{5})\.|,FL,\1\
|g
$!ba

What I get is (everything on one line):

Mr. John Doe,Exclusively Stuff,186 Caravelle Drive,Ponte Vedra,FL,33487 Mrs. Jane Smith,Props and Stuff,123 Main Drive,Jacksonville,FL,336907  

My expected output should be:

Mr. John Doe,Exclusively Stuff,186 Caravelle Drive,Ponte Vedra,FL,33487
Mrs. Jane Smith,Props and Stuff,123 Main Drive,Jacksonville,FL,336907  
HoldOffHunger
  • 18,769
  • 10
  • 104
  • 133
Allan
  • 1,162
  • 1
  • 9
  • 19
  • It seems to work just fine here. With the same file and script I get the data reformatted with a newline at the end. Putting `\n\n` in the penultimate line of your sed script gives me two newlines as expected. (I don't get a comma between "Doe" and "Exclusively" however). I'm using (GNU sed) 4.4 on Linux. Which version/platform are you using? – Bart Van Loon Sep 06 '17 at 18:57
  • I am using `sed` on FreeBSD – Allan Sep 06 '17 at 18:59
  • Ah, I see. Perhaps try with GNU sed, if that's an option? I also see that I am getting the ",FL," in the output as you seem to want from the script. Are you sure the output you're providing is coming from the script you're providing? – Bart Van Loon Sep 06 '17 at 19:04
  • It's correct. I inadvertently deleted the FL when trying to obfuscate the real data. – Allan Sep 06 '17 at 19:07
  • Consider not removing the newline after the zip code. Change `s|\n| |g` to `s|\([^[:space:]]\)\n\([^[:space:]]\)|\1 \2|g` so that only newlines with a character after them are transformed. Then you don't have to reinstate what you didn't remove. – Jonathan Leffler Sep 06 '17 at 19:18
  • You said `I can't seem to tell it to put a newline there.` and asked for help to do so. I posted [an answer](https://stackoverflow.com/a/46082547/1745001) showing the portable way to do so and you said `it doesn't answer my question`. So, please [edit] your question to make it clear what your real question is so no-one else wastes time answering the wrong question. – Ed Morton Sep 06 '17 at 19:25
  • Or, since you seem to be using extended regexes, then you can use fewer backslashes: `s|([^[:space:]])\n([^[:space:]])|\1 \2|g`. Since you're working with a script file (probably sensible -- I like using script files too) instead of command-line arguments to `sed`, you can add the backslash newline mentioned in the answer by putting a backslash at the end of a line and the rest of the material on the next line. It would be trickier if you were working with `-e '...'` options on the command line. – Jonathan Leffler Sep 06 '17 at 19:26
  • @jww - I *literally* referenced that post in my question and stated that it **didn't work.** – Allan Nov 02 '18 at 13:31

4 Answers4

119

The sed on BSD does not support the \n representation of a new line (turning it into a literal n):

$ echo "123." | sed -E 's/([[:digit:]]*)\./\1\n next line/'
123n next line

GNU sed does support the \n representation:

$ echo "123." | gsed -E 's/([[:digit:]]*)\./\1\nnext line/'
123
next line

Alternatives are:

Use a single character delimiter that you then use tr translate into a new line:

$ echo "123." | sed -E 's/([[:digit:]]*)\./\1|next line/' | tr '|' '\n'
123
next line

Or use an escaped literal new line in your sed script:

$ echo "123." | sed -E 's/([[:digit:]]*)\./\1\
next line/'
123
next line

Or define a new line:

POSIX:

nl='
'

BASH / zsh / others that support ANSI C quoting:

nl=$'\n'

And then use sed with appropriate quoting and escapes to insert the literal \n:

echo "123." | sed 's/\./'"\\${nl}"'next line/'
123
next line

Or use awk:

$ echo "123." | awk '/^[[:digit:]]+\./{sub(/\./,"\nnext line")} 1'
123
next line

Or use GNU sed which supports \n

dawg
  • 98,345
  • 23
  • 131
  • 206
  • 1
    I am attempting to use the escaped literal in my sed script as shown but for whatever reason, it's not working. However, re: what you said about BSD not supporting `\n`, I will shift my strategy to incorporate `tr`. I never would have guessed it wasn't supported. Thanks! – Allan Sep 06 '17 at 20:44
  • 1
    *I am attempting to use the escaped literal in my sed script as shown but for whatever reason, it's not working.* It is hard to do in a `sed` script vs a one line `sed`. You can also use a multi character delimiter (say `<!!>`) and then use `awk` to change that into a `\n`. To be honest, POSIX `sed` is best used for single line changes only. – dawg Sep 06 '17 at 20:52
  • Adding a literal newline in a script is no harder than on the command line - the syntax doesn't change. Obviously you'd never REALLY insert a character or string and then convert it to newlines later by a pipe to some other command as that's just unnecessarily fragile and inefficient. The OPs remaining problem has nothing to do with this though, [what I suggested](https://stackoverflow.com/a/46082547/1745001) works just fine for the question he asked and his problem now is with another part of his script (the `s|,[[:space:]]|,|g` in his loop is removing the newline after it was added). – Ed Morton Sep 06 '17 at 21:25
  • 1
    @EdMorton: I guess I did not try and debug his script with the literal new line. I do remember (as a BSD user) head scratching times were I felt it *should* work but did not. – dawg Sep 06 '17 at 21:28
  • 3
    Yeah and Solaris sed is even worse. If it's not simply `s/old/new/` then you're into different combinations of jumbled runes involving every punctuation mark, single letter and the batman symbol, with the meaning of each changing on a sed-by-sed, box-by-box basis. Hence awk.... :-). – Ed Morton Sep 06 '17 at 21:31
  • So that it works replace `gsed` with `sed`. Easy one. – Timo Nov 04 '20 at 18:55
  • `sed '/ags=\[/a \n' $cv` Append a new line after ags=\[ results in `n` on a new line. I use WSL2 Debian – Timo Nov 04 '20 at 19:00
  • 1
    @EdMorton can't wait to see a sed example with batman symbols – wtj Nov 21 '21 at 11:40
17

The portable way to get a newline in sed is a backslash followed by a literal newline:

$ echo 'foo' | sed 's/foo/foo\
bar/'
foo
bar

I guarantee there's a far simpler solution to your whole problem by using awk rather than sed though.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
2

The following works on Oracle Linux, x8664:

$ echo 'foobar' | sed 's/foo/foo\n/'
foo
bar

If you need it to match more than once per line, you'll need to place a g at the end, as in:

$ echo 'foobarfoobaz' | sed 's/foo/foo\n/g'
foo
barfoo
baz
interestedparty333
  • 2,386
  • 1
  • 21
  • 35
1

Add a line after a match.

The sed command can add a new line after a pattern match is found. The "a" command to sed tells it to add a new line after a match is found.

sed '/unix/ a "Add a new line"' file.txt

unix is great os. unix is opensource. unix is free os.

    "Add a new line"
    
    learn operating system.
    
    unixlinux which one you choose.
    
    "Add a new line"

Add a line before a match

The sed command can add a new line before a pattern match is found. The "i" command to sed tells it to add a new line before a match is found.

sed '/unix/ i "Add a new line"' file.txt

"Add a new line"

unix is great os. unix is opensource. unix is free os.

learn operating system.

"Add a new line"

unixlinux which one you choose.
Mohasin Ali
  • 3,955
  • 1
  • 20
  • 24