0

I have a text file that has approximately 25000 rows and 10 columns of data, including a column of dates and a column of data associated with those dates (in yyyymmdd format). It is in the following format:

19500101     20.7
19500102    19.9
19500103     -77.1
19500104     -1.2

I am trying to get it so all the January 1st days are together, all the January 2nd days are together, and so on for the rest of the days. That is:

 19500101     20.7
 19510101     230.1
 19520101    -91.8
 19530101    20.0

How might one be able to rearrange the text file to get this format using python?

CyclonicLife
  • 59
  • 2
  • 7

4 Answers4

1

You can read your data into a list, with each row being a string in the list. Then sort the list with a key function that only looks at the mmdd part of the date.

Here's some code that illustrates the idea using a hard-coded list, but it should be easy for you to adapt it to read the lines from your file.

data = '''
19500101     20.7
19500102    19.9
19500103     -77.1
19500104     -1.2
19510101     230.1
19520101    -91.8
19530101    20.0
'''.splitlines()[1:]

def keyfunc(line):
    return line.split(None, 1)[0][4:]

data.sort(key=keyfunc)

for row in data:
    print row      

output

19500101     20.7
19510101     230.1
19520101    -91.8
19530101    20.0
19500102    19.9
19500103     -77.1
19500104     -1.2

Here's a fancier key function:

def keyfunc(line):
    date = line.split(None,1)[0]
    return date[4:], date[:4]

If two items have the same mmdd they are then compared on the yyyy, so that all items with the same mmdd are grouped together but within the group they'll also be sorted by year.

The line.split(None,1)[0] gets the date portion of the line. You could just use line.split()[0] to do the same thing, but that's less efficient, since it has to split the whole line into individual columns, and we only need the first column for our key.

PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
0

One way to do this would be to set up a dictionary using the date as a key and appending the elements into a list for each key. Then to set your output, loop through the dates and get the list for each date (as the key) and output the elements in the list in the format that you want.

if you print each element in the list with the key for that element (which you already know) as the first item and the element as the second item, you will have it. Alternatively you can sort the dictionary as shown in the question How can I sort a dictionary by key?.

One example is

for key in sorted(mydict):
    print "%s: %s" % (key, mydict[key])

Since mydict[key] is a list you can handle it as

for key in sorted(mydict):
    for elem in mydict[key]:
        print "%s: %s" % (key, elem)
Community
  • 1
  • 1
sabbahillel
  • 4,357
  • 1
  • 19
  • 36
0

One way of achieving it would be to convert your dates to tuples of (day, month, year) and then sort by it. Something likes this should do it:

def date_as_tuple(date):
    return (int(date[6:8]), int(date[4:6]), int(date[0:4]))

lines = open("file", "r").readlines()
lines.sort(key=lambda line: date_as_tuple(line.split()[0]))
print "".join(lines)
Jan Pomikálek
  • 1,369
  • 2
  • 13
  • 22
0

Loop through the textfile creating a list of dictionaries and then proceed below!

    import datetime

    data = [{'date':'2015-01-10','Info':'b'},  #Default data layout
            {'date':'2015-01-01','Info':'a'},  
            {'date':'2016-01-01','Info':'d'}, 
            {'date':'2015-10-01','Info':'c'}]

    #Then using the sort method, sort the data in a YMD format
    data.sort(key=lambda x: datetime.datetime.strptime(x['date'], '%Y-%M-%d'))

    #Now just loop through list writing each index back to file!!!
TheLazyScripter
  • 2,541
  • 1
  • 10
  • 19
  • That will sort by year, then month, then day. But the OP wants to sort first by month, then by day, (and then possibly by year). – PM 2Ring Feb 03 '16 at 14:14
  • Sure, that'd work. Still, I think my way is a little simpler, and probably faster than calling `.strptime`. :) – PM 2Ring Feb 03 '16 at 14:32
  • Your answer is definitely a simplified version and I would upvote it if i could! I will however leave this answer here on the off chance that it helps someone in the future! – TheLazyScripter Feb 03 '16 at 14:36