2

I have a script that needs to CAT a number of files, in numerical order. Whilst it seems to work fine with a couple of hundred files, I am now experiencing some "interesting" results in handling a larger file.

The file in question has been split into 1289 individual files, named ABC.001-1289 to ABC.1289-1289

I'm using "ls -gGo ABC* | sort -hk9" to list the files in, what I would deem to be, a human readable sort order. All goes swimmingly until I hit ABC.763-1289:

ABC.001-1289 .. ABC.763-1289
ABC.1000-1289 .. ABC.1040-1289 
ABC.764-1289 .. ABC.999-1289
ABC.1041-1289 .. ABC.1289-1289

I'm thinking some sort of buffer overrun or something, but I've not experienced something like this before and am kinda scratching my head into where I would even start looking to remedy the issue.

I've tried altering the "k" value and even removing it, with little positive outcome.

The more I look into this the more I believe a KEYDEF is required, but I can't ascertain the correct format to use this....

Any thoughts?

bnoeafk
  • 489
  • 4
  • 16
  • I don't understand why you are using `-k9`, since that would make the key equal to `1289` for every item. – Adrian Ratnapala Jul 02 '14 at 07:19
  • Adrian, TBH I'm just trying to find the correct format for the "k" value. I've not done an "advanced" sort like this before and can only assume that the spaces between the ls are deemed to be the delimiters between columns. Any formatting I try, I always seem to hit the issue when 764th file comes along. – bnoeafk Jul 02 '14 at 07:37
  • The trouble is that your fields are sometimes three characters and sometimes four. I would be tempted to do a `sort -n -k5,8` to isolate the variable term, but actually the "8" varies. Perhaps you should first rename the three-digit files. – Adrian Ratnapala Jul 02 '14 at 07:41

2 Answers2

1

I wouldn't want to start debugging the sort function built into the shell. So why not just use a different sort, outside the shell? For example, I'd use python:

#!/usr/bin/python2.7
import argparse, sys, re

parser = argparse.ArgumentParser( description='concatenate the input files by order',
                                  formatter_class=argparse.ArgumentDefaultsHelpFormatter )
parser.add_argument( 'input', nargs='+', help='the paths to the files to be concatenated' )
parser.add_argument( '-n','--nosort', action='store_true', help='use the given order instead of sorting' )
parser.add_argument( '-o','--output', default='', help='output file. Will output to stdout if empty' )
args = parser.parse_args()

def human_keys( astr ):
    """
    alist.sort(key=human_keys) sorts in human order
    From unutbu @ http://stackoverflow.com/questions/5254021
    """
    keys=[]
    for elt in re.split( '(\d+)', astr ):
        elt = elt.swapcase()
        try: 
            elt = int(elt)
        except ValueError: 
            pass
        keys.append( elt )
    return keys

if not args.nosort:
    args.input.sort( key = human_keys )

output = args.output and open( args.output, 'w' ) or sys.stdout

for path in args.input:
    with open( path, 'r' ) as in_file:
        for line in in_file:
            output.write(line)

if output != sys.stdout:
    output.close() # not really needed. But tidier. Can put it in an "atexit", but that's an overkill.
Amnon Harel
  • 103
  • 1
  • 3
1

A little hacky but try this:

 ls -gGo ABC* |cut -d "." -f 2 |sort -h

or

ls -gGo ABC* |cut -b 5- |sort -h
Mathias
  • 1,470
  • 10
  • 20