1

I want to sort a file based on specific character in a .txt file.

This is a file given to me:

12345678901234567890123456789012345
header     1stfoo   DDMMYYYY 2ndfoo
sltele     Hoodie   24051988 d12Hdq
sltele     Hoodie   07051987 d30Hdq
sltele     Hoodie   07082011 d08Hdq
sltele     Hoodie   09081961 d04Hdq
sltele     Hoodie   20041962 d14Hdq
sltele     Hoodie   20032000 d01Hdq
sltele     Hoodie   13062002 d05Hdq

As you can see there is a column contains date with DDMMYYY format. If I sort it with sort -n -k 3,3 thisfile.txt > sortedfile.txt i got this result:

sltele     Hoodie   07051987 d30Hdq
sltele     Hoodie   07082011 d08Hdq
sltele     Hoodie   09081961 d04Hdq
sltele     Hoodie   13062002 d05Hdq
sltele     Hoodie   20032000 d01Hdq
sltele     Hoodie   20041962 d14Hdq
sltele     Hoodie   24051988 d12Hdq

but, I want the result like this:

sltele     Hoodie   09081961 d04Hdq
sltele     Hoodie   20041962 d14Hdq
sltele     Hoodie   07051987 d30Hdq
sltele     Hoodie   24051988 d12Hdq
sltele     Hoodie   20032000 d01Hdq
sltele     Hoodie   13062002 d05Hdq
sltele     Hoodie   07082011 d08Hdq

As a valid sortedfile.txt based on DDMMYYYY date format.

can somebady help me?

thanks in advance

faizal
  • 183
  • 2
  • 6
  • 12

3 Answers3

3

You can use the sort command, specifying multiple keys, and key start end positions:

sort -n -k 3.8,3.12 -k 3.6,3.7 -k 3.4,3.5 < input_file

output:

sltele     Hoodie   09081961 d04Hdq
sltele     Hoodie   07051987 d30Hdq
sltele     Hoodie   24051988 d12Hdq
sltele     Hoodie   20032000 d01Hdq
sltele     Hoodie   20042000 d14Hdq
sltele     Hoodie   13062002 d05Hdq
sltele     Hoodie   07082011 d08Hdq

from sort man-page:

KEYDEF is F[.C][OPTS][,F[.C][OPTS]] for start and stop position, where F is a field number and C a character position in the field; both are origin 1 ... characters in a field are counted from the beginning of the preceding whitespace.

perreal
  • 94,503
  • 21
  • 155
  • 181
1

The accepted answer doen't actually answer the question of sorting on a specific range of absolute character positions, counting from the beginning of the line (which is position 1 as counted by sort).

It is important to remember that for sort, field numbers refer to portions of text separated by the field separator, which is a non-blank to blank transition unless changed with the -t/--field-separator=SEP option. The correct way to sort on a range of absolute character positions counted from the beginning of the line is to count characters starting from field number 1, like so:

sort -k 1.STARTPOS,1.ENDPOS

You can leave off the 1.ENDPOS if you want the sort key to extend to the end of the line.

Confusing field numbers with absolute character positions can lead to surprising (and often very frustrating) results.

ack
  • 7,356
  • 2
  • 25
  • 20
0

I know there's a better way to do this, but this is what I've done in the past, rarely having to sort files.

sed -e 's/\([0-9]\{2\}\)\([0-9]\{2\}\)\([0-9]\{4\}\)/\3\2\1/g' thisfile.txt | \
   sort -n -k 3,3 | \
   sed -e 's/\([0-9]\{4\}\)\([0-9]\{2\}\)\([0-9]\{2\}\)/\3\2\1/g' > sortedfile.txt
Jon Lin
  • 142,182
  • 29
  • 220
  • 220