3

I have a file with rows like following, where 3rd column has multiple numeric values which I need to sort:

file: h1.csv

Class S101-T1;3343-1-25310;3344-1-25446 3345-1-25691 3348-1-27681 3347-1-28453
Class S101-T2;3343-2-25310;3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310;3345-3-25691 3343-3-25314
Class S101-T2;3343-4-25310;3345-4-25691 3343-4-25314 3344-4-25314
Class S102-T1;3343-5-25310;3344-5-25446 3345-5-25691

So, expected output is:

Class S101-T1;3343-1-25310;3344-1-25446 3345-1-25691 3347-1-28453 3348-1-27681
Class S101-T2;3343-2-25310;3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310;3343-3-25314 3345-3-25691
Class S101-T2;3343-4-25310;3343-4-25314 3344-4-25314 3345-4-25691
Class S102-T1;3343-5-25310;3344-5-25446 3345-5-25691

My idea was to capture 3rd column with awk and then sort it, and finally print output, but I have arrived only to capture the column. I have not succeeded in sorting it, nor printing disired output.

Here's the code I've got so far...

cat h1.csv | awk -F';' '{ gsub(" ","\n",$3); print $0 }'

I have tried (and some others giving error):

cat h1.csv | awk -F';' '{ gsub(" ","\n",$3); print $3 | "sort -u" }'
cat h1.csv | awk -F';' '{ gsub(" ","\n",$3); sort -u; print $3 }'

So, is it possible to do so, how?, any help! Thanks...

Inian
  • 80,270
  • 14
  • 142
  • 161
Andrés Chandía
  • 999
  • 1
  • 16
  • 32
  • Why have you put some numbers in **bold**? Is there a point to that? – TLP Jan 10 '22 at 18:09
  • Yes, just to highlight the current order and the desired order – Andrés Chandía Jan 10 '22 at 18:17
  • Does this answer your question? https://stackoverflow.com/questions/6438896/sorting-data-based-on-second-column-of-a-file You can use `----field-separator=,` to use comma as the field seperator. – David Parks Jan 10 '22 at 18:31
  • 1
    Incorrect dupe. This problem is completely different. It is about sorting inside each record not across the file. – anubhava Jan 11 '22 at 07:03
  • You have answers that assume the strings you want sorted are always the same length and always made up of `4digits-1digit-5digits` parts because that''s what you show in your sample input/output. If that's not always the case then [edit] your question to fix that (and test the existing answers) with a case like `1000-2-1 1000-10-1 200-1-1` which would sort very differently if done numerically vs alphabetically. – Ed Morton Jan 11 '22 at 15:40

3 Answers3

3

One option could be to split the 3rd column on a space, and then using asort() for the values using gnu-awk.

Then concatenate the first 2 fields and the splitted and sorted fields again.

awk '
BEGIN{FS=OFS=";"}
{
  n=split($3, a, " ")
  asort(a)
  res = $1 OFS $2 OFS
  for (i = 1; i <= n; i++) {
    res = res " " a[i]
  }
  print res
}' file

Output

Class S101-T1;3343-1-25310; 3344-1-25446 3345-1-25691 3347-1-28453 3348-1-27681
Class S101-T2;3343-2-25310; 3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310; 3343-3-25314 3345-3-25691
Class S101-T2;3343-4-25310; 3343-4-25314 3344-4-25314 3345-4-25691
Class S102-T1;3343-5-25310; 3344-5-25446 3345-5-25691
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Yes, I've done it, and it also works, can your code be put in a single line? I've tried but errors come up.. – Andrés Chandía Jan 10 '22 at 18:39
  • 1
    @AndrésChandía This would be the code on a single line `awk 'BEGIN{FS=OFS=";"}{n=split($3,a," ");asort(a);res=$1 OFS $2 OFS;for(i=1;i<=n;i++){res=res" "a[i]}print res}' file` – The fourth bird Jan 10 '22 at 18:41
2

In GNU awk, with your shown samples, please try following awk code.

awk '
BEGIN{
  FS=OFS=";"
  PROCINFO["sorted_in"] = "@val_num_asc"
}
{
  nf=val=""
  delete value
  num=split($NF,arr," ")
  for(i=1;i<=num;i++){
    split(arr[i],arr2,"-")
    value[arr2[1]]=arr[i]
  }
  for(i in value){
    nf=(nf?nf " ":"")value[i]
  }
  $NF=nf
}
1
'  Input_file

Explanation: Adding detailed explanation for above.

awk '                                     ##Starting awk program from here.
BEGIN{                                    ##Starting BEGIN section from here.
  FS=OFS=";"                              ##Setting FS, OFS as ; here.
  PROCINFO["sorted_in"] = "@val_num_asc"  ##Setting PROCINFO using sorted_in to make sure array values are sorted by values in ascending order only.
}
{
  nf=val=""                               ##Nullifying variables here.
  delete value                            ##Deleting value array here.
  num=split($NF,arr," ")                  ##Splitting last field into arr with separator as space here.
  for(i=1;i<=num;i++){                    ##Traversing through all elements of array arr.
    split(arr[i],arr2,"-")                ##Splitting first value of arr into arr2 by delimiter of - to make sure to get only first value eg: 3344, 3345 etc.
    value[arr2[1]]=arr[i]                 ##Assigning value array value to arr value with index of arr2 value whose index of 1st.
  }
  for(i in value){                        ##Traversing through array value here.
    nf=(nf?nf " ":"")value[i]             ##Concatenating all values to nf here.
  }
  $NF=nf                                  ##Assigning last field value to nf here.
}
1                                         ##printing edited/non-edited line here.
'  Input_file                             ##Mentioning Input_file name here.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
1

Using GNU awk for sorted_in:

$ cat tst.awk
BEGIN {
    FS = OFS = ";"
    PROCINFO["sorted_in"] = "@val_str_asc"
}
{
    split($3,a," ")
    sorted = ""
    for (i in a) {
        sorted = (sorted=="" ? "" : sorted " ") a[i]
    }
    $3 = sorted
    print
}

$ awk -f tst.awk file
Class S101-T1;3343-1-25310;3344-1-25446 3345-1-25691 3347-1-28453 3348-1-27681
Class S101-T2;3343-2-25310;3344-2-25446 3345-2-25691
Class S101-T1;3343-3-25310;3343-3-25314 3345-3-25691
Class S101-T2;3343-4-25310;3343-4-25314 3344-4-25314 3345-4-25691
Class S102-T1;3343-5-25310;3344-5-25446 3345-5-25691

Note that this assumes alphabetic sort so it'd sort 1000-1-1 before 200-1-1. That works as long as the strings you want sorted are always made up of the same length parts, i.e. 4digits-1digit-5digits.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185