1

Below is a sample data in a file

4 columns with TAB separated with last column as values separated by comma. The 3rd column actually shows number of values in the 4th column.

6338838 ESR 3   173812,10547556,10518181
6338822 ESR 2   7219086,12761162

Expected output :

6338838 ESR 3   173812
6338838 ESR 3   10547556
6338838 ESR 3   10518181
6338822 ESR 2   7219086
6338822 ESR 2   12761162

Tried with AWK , but not able to make it work.

rkj
  • 671
  • 3
  • 14
  • 25

6 Answers6

4

EDIT: How about simply using gsub to get rid of commas here :)

awk -F" +" '{gsub(",",ORS $1 OFS $2 OFS $3 OFS,$4)} 1' Input_file | column -t

Change -F to -F"\t" in case your Input_file is TAB delimited.



How about simple using -F of awk and printing as per fields values.

awk -F" +|," '{for(i=4;i<=NF;i++){print $1,$2,$3,$i}}'  Input_file

Append | column -t in above code in case you need TAB delimited output.

As per Cyrus and Ghoti's comment adding following too now in case your Input_file is TAB delimited.

awk -F '[\t,]' -v OFS='\t' '{for(i=4; i<=NF; i++) print $1,$2,$3,$i}' Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • 1
    Nice. Though, beware that the input file is TAB delimited, not space. And instead of piping through `column -t`, you could set your OFS as I mentioned in my answer. – ghoti Jun 15 '18 at 03:29
  • e.g. `awk -F '[\t,]' -v OFS='\t' '{for(i=4; i<=NF; i++) print $1,$2,$3,$i}' file` – Cyrus Jun 15 '18 at 03:35
  • @ghoti, thanks Ghoti yo made my day, Cyrus, sure added this to code now too. – RavinderSingh13 Jun 15 '18 at 03:39
  • 1
    My goodness, that's a clever approach in your edit. Very nicely done! (And it looks as if it's tied with my second solution as the shorted awk code to produce the desired results.) – ghoti Jun 15 '18 at 04:10
  • @ghoti, thank you sir :) I will keep this comment save with me to get encouraged every time too. – RavinderSingh13 Jun 15 '18 at 04:16
  • 1
    Upon reflection, you could shave off six more characters by removing unnecessary spaces: `'{gsub(",",ORS$1OFS$2OFS$3OFS,$4)} 1'`. This definitely gets you top spot for brevity, though it is less readable and might not work in all versions of awk. – ghoti Jun 15 '18 at 10:52
3

This should work:

awk '{n = split($4,x,","); for (i = 1; i <= n; ++i) {printf "%s %s %s %s\n", $1, $2, $3, x[i]} }' yourfile
lubgr
  • 37,368
  • 3
  • 66
  • 117
  • 1
    I do wonder, sometimes, why the awk community prefers one-liners. Isn't https://gist.github.com/charles-dyfis-net/8162e47224e74f564e31859fd14e0901 -- the same code, but formatted such that one can look at the alignment to see where each block and command begins and ends -- far more readable? – Charles Duffy Jun 14 '18 at 19:25
  • Shell scripts allow literal newlines inside single-quoted strings, so still perfectly pasteable; just need a leading `awk '` line at the beginning and `'` at the end. – Charles Duffy Jun 14 '18 at 19:26
  • @CharlesDuffy When navigating through your local history though, aren't one-liners easier to reuse to e.g. pipe the output through another command? – lubgr Jun 14 '18 at 19:28
  • In my local history search, up-arrowing into a command with a multi-line literal string shows me... that original command with its multi-line string, whitespace and indentation still intact, and with navigation across line boundaries available for editing. – Charles Duffy Jun 14 '18 at 19:29
  • @CharlesDuffy I am somewhat convinced :) – lubgr Jun 14 '18 at 19:31
  • Heh. Another practice available -- if you want to pipe multi-line chunks of code around, it's probably a good time to encapsulate that code in a shell function anyhow. Throw a `foo() {` on the beginning and a `}` on the end, and then you can `foo | ...` or `... | foo` to your heart's content. – Charles Duffy Jun 14 '18 at 19:33
  • 1
    split returns you the length, just capture it there and use in the for loop. – karakfa Jun 14 '18 at 19:42
  • `split()` returns the number of fields. – Cyrus Jun 15 '18 at 03:45
2

With GNU awk:

awk 'BEGIN{FS=OFS="\t"} {c1to3=$1 FS $2 FS $3; columns=split($4,array,","); for(i=1; i<=columns; i++) print c1to3,array[i]}' file

or shorter:

awk -v OFS='\t' '{columns=split($4,array,","); for(i=1; i<=columns; i++) print $1,$2,$3,array[i]}' file

or

awk 'BEGIN{OFS="\t"} {c=split($4,a,","); NF=3; for(i=1; i<=c; i++) print $0,a[i]}' file

Output:

6338838 ESR     3       173812
6338838 ESR     3       10547556
6338838 ESR     3       10518181
6338822 ESR     2       7219086
6338822 ESR     2       12761162
Cyrus
  • 84,225
  • 14
  • 89
  • 153
2

I love these "who can do it shorter" contests. :-)

If we cared to use the item count from $3, we could do this:

awk '{split($4,a,",");for(i=1;i<=$3;i++){$4=a[i];print}}' OFS='\t' input.txt

But the following produces similar results in fewer bytes of code. Output is in the reverse order of subfields in $4.

awk '{for(i=split($4,a,",");i;i--){$4=a[i];print}}' OFS='\t' input.txt

Not bothering to set FS because your sample input doesn't appear to include spaces within the fields.

ghoti
  • 45,319
  • 8
  • 65
  • 104
1

In native bash:

while IFS=$'\t' read -r one two three four; do
    IFS=, read -r -a pieces <<<"$four"
    for piece in "${pieces[@]}"; do
      printf '%s\t%s\t%s\t%s\n' "$one" "$two" "$three" "$piece"
    done
done <yourfile
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
1

Here is another awk, without referencing to unused fields.

$ awk '{n=split($NF,a,","); 
        for(i=1;i<=n;i++) 
          {sub($NF"$",a[i]); 
           print}}' file.t
karakfa
  • 66,216
  • 7
  • 41
  • 56