How to sort a CSV file by date?

Question

I have been reading several posts (1, 2, 3) about this but I can't make it work yet. I have a (simplified) CSV file like this:

NOMBRE,APELLIDO,ID,NACIMIENTO,FECHAINGRESO,MAILPERSONAL,DEPARTAMENTO
name1,lastname1,123,2000-01-01,2021-03-13,mymail1@example-com,IT
name2,lastname2,456,1999-01-01,2020-01-21,mymail2@example-com,IT

I want to sort it according to the header FECHAINGRESO, oldest date first, but don't how to do this. I have tried this with python 3.8.5 in Ubuntu 20:

import csv
import os
from datetime import datetime
# With this I read the cvs and print it to check if everything is ok
with open('Empleados.csv', newline='') as csvfile:
   spamreader = csv.reader(csvfile, delimiter=',')
   for row in spamreader:
       print(', '.join(row))
# The next is the code from several attempts where I failed to sort the cvs
with open('Empleados.csv', newline='') as csvfile:
    # I wrote 4 because I belive the position 4 in the headers' row is the one with FECHAINGRESO
    csvfile = sorted(csvfile, key = lambda row: datetime.strptime(row[4], "%d-%m-%Y"))
    print(csvfile)
    s = sorted(csvfile, key=lambda x:datetime.strptime(x[4],"%d-%m-%Y"), reverse=True)
    print(s)
    l = sorted(csvfile, key=lambda x: x[4], reverse=True)
    print(l)
    sortedlist = sorted(csvfile, key=operator.itemgetter(4), reverse=False)
    print(sortedlist)
    sortedlist = sorted(csvfile, key=lambda row: row[4], reverse=True)
    print(sortedlist)

Basically none of them worked because it reads the row like a string and usually returns this error:

  File "/home/Pruebas VSC/prueba_postgresql.py", line 31, in <module>
    csvfile = sorted(csvfile, key = lambda row: datetime.strptime(row[4], "%d-%m-%Y"))
  File "/home/Pruebas VSC/prueba_postgresql.py", line 31, in <lambda>
    csvfile = sorted(csvfile, key = lambda row: datetime.strptime(row[4], "%d-%m-%Y"))
  File "/usr/lib/python3.8/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/lib/python3.8/_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data 'E' does not match format '%d-%m-%Y'

Some of them don't fail, but they don't sort by the date in that column the CSV.

Everything in the last with-open are codes from other questions I found in google, but I don't understand everything. Hope someone can help me understand how to sort this list. I want to save the CSV file later, but I believe that should be easy with write after it is sorted.

Are you opposed to using another library, like `pandas`? If you get it into `pandas` format a lot of the sorting operations work very painlessly. — William, Jun 23 '21 at 19:45
`csvfile` isn’t a list, it’s a file, so you can’t sort it. — DisappointedByUnaccountableMod, Jun 23 '21 at 19:46
@barny file objects in Python are iterables that yield their content of the file line by line, so `with open('test.txt') as f: sorted(f)` is perfectly fine — DeepSpace, Jun 23 '21 at 19:51
Yeah it’s an iterable of lines, not the result of reading as a csv — DisappointedByUnaccountableMod, Jun 23 '21 at 20:04
@William I was just reading about pandas (https://stackoverflow.com/questions/54911225/python-sorting-csv-by-date-and-time) but still can make it work. Let me read the answer you wrote — Alex Turner, Jun 23 '21 at 20:28

William · Accepted Answer · 2021-06-23T20:01:40.980

2

As @barny pointed out, you need to have a list. Also as @DeepSpace pointed out you can perform the sorted operation on the iterable.

A couple other issues. The sorting only works without the header row (which doesn't contain a date). So I removed it to test the following. Also I had to tweak your date format string to be "%Y-%m-%d" to match your data.

So altogether:

with open('Empleados.csv', newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',')
    spamreader = sorted(spamreader, key=lambda x:datetime.strptime(x[4],"%Y-%m-%d"), reverse=True)

Edit: If you want to handle the header cleanly as well (without trying it yourself for learning exercise), use the following.

with open('Empleados.csv', newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',')
    header = next(spamreader)
    spamreader = sorted(spamreader, key=lambda x:datetime.strptime(x[4],"%Y-%m-%d"), reverse=True)
    spamreader.insert(0, header)

edited Jun 23 '21 at 20:01

answered Jun 23 '21 at 19:55

William

381
1
8

Did you try it? it does not work since it tries to convert the string `'FECHAINGRESO'` to a date – DeepSpace Jun 23 '21 at 19:58
Yes I tried it. Didn't you see in my post where I said I had to remove the header row? – William Jun 23 '21 at 19:58
,,, which does not help the OP since their file is a csv file with a header... this should be handled by code, not by modifying the input file in a way that makes it useless – DeepSpace Jun 23 '21 at 19:59
Sure it helps. They can figure out how to fill in the gaps. Are we supposed to do everything for them? It's a learning website. – William Jun 23 '21 at 20:00
Both codes worked fine. I prefer the second one because you don't have to change the data/csv. Thanks – Alex Turner Jun 23 '21 at 20:43

score 0 · Answer 2 · answered Jun 23 '21 at 20:09

0

If you're open to external libraries, the best way to do this is to use the power of pandas. This is what pandas was made for.

import pandas as pd
df = pd.read_csv('Empleados.csv')
df = df.sort_values('FECHAINGRESO', ascending=False)
df

# to save the sorted csv
df.to_csv('Empleados_sorted.csv')

answered Jun 23 '21 at 20:09

William

381
1
8

Just tried this and worked, let me check the other answer – Alex Turner Jun 23 '21 at 20:36
I change ascending=True for it to be shown as I want it, but it works just fine. Thanks – Alex Turner Jun 23 '21 at 20:42
Perfect! Yea this is my preferred method, but if you want to avoid having to import pandas then the other answer will work also. – William Jun 23 '21 at 21:43
I don't care using pandas, but because it's "simpler" the other way I selected that as the answer, but this is also correct. – Alex Turner Jun 25 '21 at 11:45

How to sort a CSV file by date?

2 Answers2