1

I have seen lot of posts on sorting a file based on a column but didn't help me.

I want to sort a CSV file based on column 2 only.

for example : The data in my file looks like below

H1,H2,H3,H4
C11,R_G,S_F_G,22-OCT-2019
C12,R_G,S_F_G,22-OCT-2019
C13,R_E,S_F_E,22-OCT-2019
C13,R_E,S_F_E_RA,22-OCT-2019
C13,R_E,S_F_E_RB,22-OCT-2019
C14,R_E,S_F_E,22-OCT-2019
C14,R_E,S_F_E_RA,22-OCT-2019
C14,R_E,S_F_E_RB,22-OCT-2019

Expected sorting order of column 2 like below

H1,H2,H3,H4
C13,R_E,S_F_E,22-OCT-2019
C13,R_E,S_F_E_RA,22-OCT-2019
C13,R_E,S_F_E_RB,22-OCT-2019
C14,R_E,S_F_E,22-OCT-2019
C14,R_E,S_F_E_RA,22-OCT-2019
C14,R_E,S_F_E_RB,22-OCT-2019
C11,R_G,S_F_G,22-OCT-2019
C12,R_G,S_F_G,22-OCT-2019

I tried to sort with below command

awk 'NR==1; NR > 1 {print $0 | "sort -t, -k2"}' inputfile >> outputfile

The result is like below

H1,H2,H3,H4
C13,R_E,S_F_E,22-OCT-2019
C14,R_E,S_F_E,22-OCT-2019
C13,R_E,S_F_E_RA,22-OCT-2019
C14,R_E,S_F_E_RA,22-OCT-2019
C13,R_E,S_F_E_RB,22-OCT-2019
C14,R_E,S_F_E_RB,22-OCT-2019
C11,R_G,S_F_G,22-OCT-2019
C12,R_G,S_F_G,22-OCT-2019

If you observe the result, column 2 and column 3 is getting sorted, but I want only column 2 to be sorted and if the column 2 is same for 2 rows it should be in the order of appearance in the input file.

it will be great if anyone can help me to understand what I am doing wrong

Raj
  • 1,843
  • 2
  • 21
  • 47
  • 1
    Possible duplicate of [Sort CSV file by column priority using the "sort" command](https://stackoverflow.com/questions/9471101/sort-csv-file-by-column-priority-using-the-sort-command) – Ravi Saroch Nov 26 '19 at 05:10
  • 1
    The answer of glen Jackman is fantastic https://unix.stackexchange.com/q/345076/273492 – kvantour Nov 26 '19 at 07:00
  • ts not duplicate @RaviSaroch I am expecting to sort only based on column 2. If 2 rows in column 2 are same, it should be in the order of appearance in the input file. I already tried the solution which you referred. Thanks for your time. – Raj Nov 26 '19 at 19:07
  • @Raj do you have seen this answer https://stackoverflow.com/a/59041272/757714 ? Is it not ok? Why? Thank you – aborruso Nov 27 '19 at 17:37
  • @aborruso I will not be able to install because of admin restrictions. and even if i install in my dev machine after getting approval, installing it in my production server is a big job because of lot of approval process involved in it. – Raj Nov 28 '19 at 11:50

3 Answers3

0

Using Miller (https://github.com/johnkerl/miller) and running

mlr --csv sort -f H2 input.csv

you will have

H1,H2,H3,H4
C13,R_E,S_F_E,22-OCT-2019
C13,R_E,S_F_E_RA,22-OCT-2019
C13,R_E,S_F_E_RB,22-OCT-2019
C14,R_E,S_F_E,22-OCT-2019
C14,R_E,S_F_E_RA,22-OCT-2019
C14,R_E,S_F_E_RB,22-OCT-2019
C11,R_G,S_F_G,22-OCT-2019
C12,R_G,S_F_G,22-OCT-2019
aborruso
  • 4,938
  • 3
  • 23
  • 40
  • "mlr" is not recognized. it looks like i have to install/configure Miller project. If yes, I need to go through lot of approval process @aborruso. Can you please let me know id the same can be achieved with awk/some other regular command. Thanks for your time. – Raj Nov 27 '19 at 18:17
  • @Raj Miller is is awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON. Install it is very easy http://johnkerl.org/miller/doc/build.html#Prebuilt_executables – aborruso Nov 27 '19 at 20:57
0

Why not simply use head, tail and sort, e.g.

head -n1 file; tail -n+2 file | sort -t, -k2

Where the use of head -n1 file simply outputs the header, then the remaining lines obtained with tail -n+2 file are piped to sort -t, -k2 for sorting with the result being the complete sorted file with heading.

To create a new file containing the sorted results, run the command line in a subshell and redirect the output to a new file, e.g.

(head -n1 file; tail -n+2 file | sort -t, -k2) > sortedfile

(you can then use the 2-step of rm file; mv sortedfile file to replace the original with the sorted file if you like)

Example Use/Output

$ head -n1 file; tail -n+2 file | sort -t, -k2
H1,H2,H3,H4
C13,R_E,S_F_E,22-OCT-2019
C14,R_E,S_F_E,22-OCT-2019
C13,R_E,S_F_E_RA,22-OCT-2019
C14,R_E,S_F_E_RA,22-OCT-2019
C13,R_E,S_F_E_RB,22-OCT-2019
C14,R_E,S_F_E_RB,22-OCT-2019
C11,R_G,S_F_G,22-OCT-2019
C12,R_G,S_F_G,22-OCT-2019
David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • Thanks for your time @David Rankin. I am expecting to sort only based on column 2. If 2 rows in column 2 are same, it should be in the order of appearance in the input file. If you observe the results you posted, when column 2 is same, it is sorted based on column 1/3. but I want it to be same order as input file. Please observe Expected Results which I mentioned in the question. – Raj Nov 26 '19 at 19:15
  • I see what you are saying. That may take a manual sort since even with the `awk` `sort` or the `sort` with shell, there is very little control over the algorithm used. You would almost need a bubble or insertion sort. I'll have time in an hour or two to look at it further. – David C. Rankin Nov 26 '19 at 19:21
0

Try this:-

sort --field-separator=',' --key=2 filename

--field-separator=',' => field separator is ","

--key=2 or -k 2 : sorting a data on the basis of second column number

Ravi Saroch
  • 934
  • 2
  • 13
  • 28
  • Thanks for your time @RaviSaroch .I am expecting to sort only based on column 2. If 2 rows in column 2 are same, it should be in the order of appearance in the input file. I already tried the solution which you referred. If you observe the results you posted, when column 2 is same, it is sorted based on column 1/3. but I want it to be same order as input file. Please observe Expected Results which I mentioned in the question – Raj Nov 26 '19 at 19:17