Split line with multiple delimiters in Unix

Question

I have the below lines in a file

id=1234,name=abcd,age=76
id=4323,name=asdasd,age=43

except that the real file has many more tag=value fields on each line. I want the final output to be like

id,name,age
1234,abcd,76
4323,asdasd,43

I want all values before (left of) the = to come out as separated with a , as the first row and all values after the (right side) of the = to come below for in each line

Is there a way to do it with awk or sed? Please let me know if for loop is required for the same?

I am working on Solaris 10; the local sed is not GNU sed (so there is no -r option, nor -E).

Get rid of the obfuscating `...`s and just post a complete, testable example. — Ed Morton, Dec 26 '15 at 14:53
I am sorry Ed. Those were just meant to show that the number of fields are unknown . So the extraction is totally based on pattern matching without considering the number/name of fields. — Akshay, Dec 26 '15 at 14:58
I understand the intent but find some other way to express your requirements than cluttering up what should simply be concise, testable sample input/output with text that doesn't actually exist and we somehow have to deal with when testing our proposed solution for you. — Ed Morton, Dec 26 '15 at 15:03

Ed Morton · Answer 1 · 2015-12-26T15:31:23.717

2

$ cat tst.awk
BEGIN { FS="[,=]"; OFS="," }
NR==1 {
    for (i=1;i<NF;i+=2) {
        printf "%s%s", $i, (i<(NF-1) ? OFS : ORS)
    }
}
{
    for (i=2;i<=NF;i+=2) {
        printf "%s%s", $i, (i<NF ? OFS : ORS)
    }
}

$ awk -f tst.awk file
id,name,age
1234,abcd,76
4323,asdasd,43

Assuming they don't really exist in your input, I removed the ...s etc. that were cluttering up your example before running the above. If that stuff really does exist in your input, clarify how you want the text "(n number of fields)" to be identified and removed (string match? position on line? something else?).

EDIT: since you like the brevity of the cat|head|sed; cat|sed approach posted in another answer, here's the equivalent in awk:

$ awk 'NR==1{h=$0;gsub(/=[^,]+/,"",h);print h} {gsub(/[^,]+=/,"")} 1' file
id,name,age
1234,abcd,76
4323,asdasd,43

edited Dec 26 '15 at 15:31

answered Dec 26 '15 at 14:58

Ed Morton

188,023
17
78
185

Wow. Actually , I had the pseudo code for the above in mind but could not realise it in syntax. Amazing as it is , I'd for the solution mentioned by Tomas and Cyrus since it just takes one line. Thanks a ton though. – Akshay Dec 26 '15 at 15:03
No, the posted cat+head+sed solution does NOT take just one line and it uses UUOC and it uses multiple tools+pipes when one would do and it takes 2 passes of the file, and it would not work if your input was coming from a stream instead of a file. Good luck! – Ed Morton Dec 26 '15 at 15:07
Correct. That's why ,I have used the one mentioned in the comment section First I did sed 's/=[^,]\{1,\}//g' on the file which takes out the headers and second sed 's/[^,]\{1,\}=//g' which brings out all the values. I think this is cool right ? :) – Akshay Dec 26 '15 at 15:12
Whether you use the sed commands from the answer you chose or the sed commands from the comments below it, my comment above applies. Obviously, use it if you like and as long as you aren't worried about efficiency, reusabailty, enhancability, etc. it should work just fine. – Ed Morton Dec 26 '15 at 15:24
Since you like the `cat|head|sed; cat|sed` approach posted in another answer, I edited my answer to show the equivalent approach in awk. – Ed Morton Dec 26 '15 at 15:32
Hmmm..I see the other side now . – Akshay Dec 26 '15 at 15:34

score 0 · Accepted Answer · answered Dec 26 '15 at 13:15

0

FILE=yourfile.txt

# first line (header)
cat "$FILE" | head -n 1 | sed -r "s/=[^,]+//g"

# other lines (data)
cat "$FILE" | sed -r "s/[^,]+=//g"

answered Dec 26 '15 at 13:15

Tomas M

6,919
6
27
33

Hi Tomas,That was quick.Thanks.But sed -r comes out to say illegal option.Is there any alternative? – Akshay Dec 26 '15 at 13:35
Interesting, works for me here. But I'm on Linux, so it is possible that your sed is some different version. – Tomas M Dec 26 '15 at 13:36
That is true . I am at the office and this thing in front of me is Solaris. – Akshay Dec 26 '15 at 13:40
2

@TomasM: replace first sed command by `sed 's/=[^,]\{1,\}//g'` and second by `sed 's/[^,]\{1,\}=//g'` to work with GNU and solaris' sed. – Cyrus Dec 26 '15 at 13:58

repzero · Answer 3 · 2015-12-26T14:47:08.987

0

sed -r '1 s/^/id,name,age\n/;s/id=|name=|age=//g' my_file

edit: or use

sed '1 s/^/id,name,age\n/;s/id=\|name=\|age=//g'

output

id,name,age
1234,abcd,76 ...(n number of fields)
4323,asdasd,43...

edited Dec 26 '15 at 14:47

answered Dec 26 '15 at 13:50

repzero

8,254
2
18
40

try "sed --help" and see what arguments has the description "use extended regular expressions in the script."..then use that argument... – repzero Dec 26 '15 at 14:01
-e -f and -n are only supported .Is there an alternative – Akshay Dec 26 '15 at 14:30
take out the -r option..and it should work fine since the pattern is also BRE – repzero Dec 26 '15 at 14:35
Actually the thing is , that the fields in the line are not known. That is the catch . If you see my question again, you'll notice. So if columns are known then this is fine . – Akshay Dec 26 '15 at 14:37
my apologies use the next solution in my edited answer – repzero Dec 26 '15 at 14:37
Umm...actually.This is not quite the thing. You see, like instead of name ,age ,and all I would want 1)anything before the "=" as first line separated by comma(this becomes the header) 2)anything after the "=" as the rest of the lines(this the data ) I want this to go in csv format , so I want it like that .I don't want to hard code the column names,instead I wish that they picked from the command line via pattern matching. – Akshay Dec 26 '15 at 14:44
Yes..based on the output..the command line work on my side as expected...I will add the output on my side in my question for reference – repzero Dec 26 '15 at 14:46

peak · Answer 4 · 2015-12-27T06:45:41.333

0

The following simply combines the best of the sed-based answers so far, showing you can have your cake and eat it too. If your sed does not support the -r option, chances are that -E will do the trick; all else failing, one can replace R+ by RR* where R is [^,]

sed -r '1s/=[^,]+//g; s/[^,]+=//g'

(That is, the portable incantation would be:

sed "1s/=[^,][^,]*//g; s/[^,][^,]*=//g"

)

edited Dec 27 '15 at 06:45

answered Dec 27 '15 at 04:10

peak

105,803
17
152
177

Could you please put all the alternatives down here? I using solaris and -r and -E is not supported. – Akshay Dec 27 '15 at 06:24

Split line with multiple delimiters in Unix

4 Answers4