1

Hi I have a file with data in the following format:

262353824192
Motley Crue Too Fast For Love Vinyl LP Leathur Records LR123 rare 3rd pressing
http://www.ebay.co.uk/itm/Motley-Crue-Too-Fast-Love-Vinyl-LP-Leathur-Records-LR123-rare-3rd-pressing-/262353824192



301870324112
TRAFFIC Same UK 1st press vinyl LP in gatefold / booklet sleeve Island pink eye
http://www.ebay.co.uk/itm/TRAFFIC-Same-UK-1st-press-vinyl-LP-gatefold-booklet-sleeve-Island-pink-eye-/301870324112



141948187203
NOW That's What I Call Music LP'S Joblot 2-14 MINT CONDITION Vinyl
http://www.ebay.co.uk/itm/NOW-Thats-Call-Music-LPS-Joblot-2-14-MINT-CONDITION-Vinyl-/141948187203

I would like replace the single new lines with a pipe, but leave the double new lines as they are. I have tried:

tr '\n' '|' < text.txt

But this replaces all new lines with | so the separate products are no longer on different lines. I basically want a | delimiter between the product number, title and url, but each separate product on a different line. How can I achieve this?

neilH
  • 3,320
  • 1
  • 17
  • 38

6 Answers6

1

Use tr and a little bit of sed:

tr "\n" "|" < text.txt | sed 's/||\+/\n/g'
Walter A
  • 19,067
  • 2
  • 23
  • 43
0

You could use awk to do this:

awk ' /^$/ { print; } /./ { printf("%s|", $0); } END {print '\n'}' text.txt

This will find any blank line and just print it as-is. If it fin ds any value on the line it will use printf and stick a pipe after it. At the end of processing it prints a newline character to finish up.

JNevill
  • 46,980
  • 4
  • 38
  • 63
0

This has already been partially answered HERE, but not completely.

I would add an additional transform to change double newlines to some character (hash in this case), then replace the hashes with a newline (or two if you want to go back to the original formatting of those) after changing the single newlines to be pipes.

sed -e ':a' -e 'N' -e '$!ba' -e 's/\n\n/#/g' -e 's/\n/|/g' -e 's/#/\n/g'

This gives the output:

262353824192|Motley Crue Too Fast For Love Vinyl LP Leathur Records LR123 rare 3rd pressing|http://www.ebay.co.uk/itm/Motley-Crue-Too-Fast-Love-Vinyl-LP-Leathur-Records-LR123-rare-3rd-pressing-/262353824192

301870324112|TRAFFIC Same UK 1st press vinyl LP in gatefold / booklet sleeve Island pink eye|http://www.ebay.co.uk/itm/TRAFFIC-Same-UK-1st-press-vinyl-LP-gatefold-booklet-sleeve-Island-pink-eye-/301870324112

141948187203|NOW That's What I Call Music LP'S Joblot 2-14 MINT CONDITION Vinyl|http://www.ebay.co.uk/itm/NOW-Thats-Call-Music-LPS-Joblot-2-14-MINT-CONDITION-Vinyl-/141948187203
Community
  • 1
  • 1
Neil Twist
  • 1,099
  • 9
  • 12
0

awk to the rescue!

awk -F'\n' -v RS= -v OFS='|' '{$1=$1;printf "%s", $0 RT}' file

this preserves spacing between paragraphs, 3 lines as in the original file.

karakfa
  • 66,216
  • 7
  • 41
  • 56
0

Just use sed:

sergey@x50n:~> cat in.txt | tr '\n' '|' | sed -e 's/||\+/\n\n/g; s/|$/\n/'
262353824192|Motley Crue Too Fast For Love Vinyl LP Leathur Records LR123 rare 3rd pressing|http://www.ebay.co.uk/itm/Motley-Crue-Too-Fast-Love-Vinyl-LP-Leathur-Records-LR123-rare-3rd-pressing-/262353824192

301870324112|TRAFFIC Same UK 1st press vinyl LP in gatefold / booklet sleeve Island pink eye|http://www.ebay.co.uk/itm/TRAFFIC-Same-UK-1st-press-vinyl-LP-gatefold-booklet-sleeve-Island-pink-eye-/301870324112

141948187203|NOW That's What I Call Music LP'S Joblot 2-14 MINT CONDITION Vinyl|http://www.ebay.co.uk/itm/NOW-Thats-Call-Music-LPS-Joblot-2-14-MINT-CONDITION-Vinyl-/141948187203

First we replace all newlines with a pipe using tr as in your example.

Then the first expression in sed command (i.e. s/||\+/\n\n/g;) replaces all occurrences of more than one pipe with two newlines. You also may replace them with one line if you do not want blank lines between the lines of output. And the second expression of sed replaces the trailing pipe with a newline to produce more readable output (or more "conventional" empty line at the end of file).

Also note that \+ in sed regex is a GNU extension. Thus if you are using non-GNU implementation of sed (FreeBSD, AIX or so), use standard syntax: |||* instead of ||\+.

Sergey
  • 7,985
  • 4
  • 48
  • 80
0

I made a very specific solution to your problem with awk (specific because it assumes you always have the same number of new lines between the groups of records).

awk 'BEGIN {RS="\n\n\n"; FS="\n"; OFS="|"} {print $1,$2,$3}' < text.txt

It sets the record separator to 3 newlines, field separator to one newline, and the output field separator to pipe. Then for each record (every block seperated by 3 newlines), it prints the first 3 fields (that are separated by one newline), and on the output it separates them with a pipe

Quentin
  • 1,085
  • 1
  • 11
  • 29