0

I have a flat file containing all records in a single line because there is no new line character in the file. Ex: Name, Age, Band, Address, Name, Age, Band, Address, Name, Age, Band, Address Ideally they should have been 3 records but they are all being read by my ETL tool as a single record. I found something on the site which was similar to my problem and had the solution:

sed 's/\([^,]*,[^,]*\),/\1\n/g'1)

I have not tried it yet but I am going to, however, I dont understand anything after the sed 's/ . Can someone please make understand what each of these character after 's/ is doing.

and also if anyone has any other solution to get this long line of columns which is being read as single records to be split in rows.

Thanks,

Rajni

giusti
  • 3,156
  • 3
  • 29
  • 44
Rajni
  • 5
  • 3
  • This might help: [The Stack Overflow Regular Expressions FAQ](http://stackoverflow.com/a/22944075/3776858) – Cyrus Nov 27 '16 at 16:47

2 Answers2

1

Assuming that your file is called input.txt, you could try something like this:

xargs -a input.txt -n4 -d"," printf "%s,%s,%s,%s\n"
Michael Vehrs
  • 3,293
  • 11
  • 10
0

Assuming that the trailing 1) in the question is a typo,

sed 's/\([^,]*,[^,]*\),/\1\n/g'

will replace every second comma with a newline (if you have a sed which honors \n in the replacement string; not all implementations do). The \( and \) start and end a group, respectively. The [^,]* matches the longest possible string of non-comma characters, and the , matches a single comma. The / is a separator, so that the s command is globally replacing all occurrences of the pattern with the group (two strings separated by a comma) and a newline.

That is clearly not what you want. To replace the 4th comma of each line with a newline, you can do (with gnu sed):

sed -n ':a; s/,/\n/4; t b; :b; {P; D}; b a;'

There are better ways (eg, perl) to do this, but since the purpose of the question seems to be to understand sed more than to actually filter the data, this is a fun solution to examine.

William Pursell
  • 204,365
  • 48
  • 270
  • 300
  • Thanks for your help in explaining, I thought ^ means beginning of line in Sed what I remember reading somewhere, but from what you have explained its indicating a negation in this context correct? this helped me. And what is the \1 before the \n is doing? I have another question the numeric we give at the end like \g or \1 or \3 what does that mean? ex: \3 - does it mean it should substitute every 3rd occurrence or all occurrences starting from 3rd one? – Rajni Nov 30 '16 at 09:22