sort csv by column

Question

I want to sort a CSV table by date. Started out being a simple task:

import sys
import csv

reader = csv.reader(open("files.csv"), delimiter=";")

for id, path, title, date, author, platform, type, port in reader:
    print date

I used Python's CSV module to read in a file with that structure:

id;file;description;date;author;platform;type;port

The date is ISO-8601, therefore I can sort it quite easily without parsing: 2003-04-22 e. g.
I want to sort the by date, newest entries first
How do I get this reader into a sortable data-structure? I think with some effort I could make a datelist: datelist += date, split and sort. However I have to re-identify the complete entry in the CSV table. It's not just sorting a list of things.
csv doesn't seem to have a built in sorting function

The optimal solution would be to have a CSV client that handles the file like a database. I didn't find anything like that.

I hope somebody knows some nice sorting magic here ;)

If you simply want a tool to sort CSV files, see my FOSS project csvfix at http://code.google.com/p/csvfix/ — , Jan 20 '10 at 09:54

score 81 · Accepted Answer · edited Feb 10 '23 at 13:00

81

Since 'date' in column has index 3,

import operator
sortedlist = sorted(reader, key=operator.itemgetter(3), reverse=True)

or use lambda

sortedlist = sorted(reader, key=lambda row: row[3], reverse=True)

edited Feb 10 '23 at 13:00

Tms91

3,456
6
40
74

answered Jan 20 '10 at 09:51

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

Does this re-write the file, or just save the sorted list in the variable? – Jeff Apr 16 '14 at 18:11
4

@Jeff: It does not touch the original file. If you want to write out the results then you must do so as a separate operation. – Ignacio Vazquez-Abrams Apr 16 '14 at 20:50
@IgnacioVazquez-Abrams What is the difference between these two methods, what are they doing? Which one should one choose? – abaumg Jul 28 '17 at 10:20
@abaumg: Functionally they are identical. There may be a small speed difference between them, but that probably won't matter unless there are millions of records in the file. – Ignacio Vazquez-Abrams Jul 28 '17 at 16:13
1

This is a very good, generic approach which also works if you load the data into a list of rows which than include a list of columns. Great - Thank you! – gies0r Aug 07 '19 at 11:03
Who is looking for sort of csv data frame: `csvData.sort_values(["date"], axis=0, ascending=[False], inplace=True)` – Marek Bernád Aug 25 '22 at 20:05

score 19 · Answer 2 · edited Apr 06 '18 at 04:23

19

To sort by MULTIPLE COLUMN (Sort by column_1, and then sort by column_2)

with open('unsorted.csv',newline='') as csvfile:
    spamreader = csv.DictReader(csvfile, delimiter=";")
    sortedlist = sorted(spamreader, key=lambda row:(row['column_1'],row['column_2']), reverse=False)


with open('sorted.csv', 'w') as f:
    fieldnames = ['column_1', 'column_2', column_3]
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()
    for row in sortedlist:
        writer.writerow(row)

edited Apr 06 '18 at 04:23

Foreever

7,099
8
53
55

answered May 16 '17 at 03:54

Tiina

4,285
7
44
73

3

Headers of csv considered here!! – Foreever Apr 06 '18 at 04:25

score 12 · Answer 3 · edited May 23 '17 at 12:26

The reader acts like a generator. On a file with some fake data:

>>> import sys, csv
>>> data = csv.reader(open('data.csv'),delimiter=';')
>>> data
<_csv.reader object at 0x1004a11a0>
>>> data.next()
['a', ' b', ' c']
>>> data.next()
['x', ' y', ' z']
>>> data.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Using operator.itemgetter as Ignacio suggests:

>>> data = csv.reader(open('data.csv'),delimiter=';')
>>> import operator
>>> sortedlist = sorted(data, key=operator.itemgetter(2), reverse=True)
>>> sortedlist
[['x', ' y', ' z'], ['a', ' b', ' c']]

score 2 · Answer 4 · answered Aug 03 '21 at 21:46

2

for sorting csv by column, i would use something like this

import pandas
csvData = pandas.read_csv('myfile.csv')
csvData.sort_values(["date"], axis=0, ascending=[False], inplace=True)
print(csvData)

answered Aug 03 '21 at 21:46

Gajendra D Ambi

3,832
26
30

score 0 · Answer 5 · answered Oct 19 '22 at 09:12

you can do it with pandas and its easy

import pandas as pd
df = pd.read_csv("File.csv")
sorted_df = df.sort_values(by=["price","title",...], ascending=False)
sorted_df.to_csv('homes_sorted.csv', index=False)

the .sort_values method returns a new dataframe, so make sure to assign this to a new variable.

Tms91 · Answer 6 · 2023-02-10T13:38:45.993

Combining the answers given by Ignacio Vazquez-Abram and by Tiina:

fieldnames = [ 'id', 'path', 'title', 'date', 'author', 'platform', 'type', 'port' ]

# this means: order by 'id', 'path', ..., 'port'
items = ('id', 'path', 'title', 'date', 'author', 'platform', 'type', 'port')
  
with open('unsorted.csv',newline='') as csvfile:
    spamreader = csv.DictReader(csvfile, delimiter=";")
    import operator
    sortedlist = sorted(reader, key=operator.itemgetter(*items), reverse=True)

with open('sorted.csv', 'w') as f:
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()
    for row in sortedlist:
        writer.writerow(row)

With this, you can

1-order the rows by multiple columns.

2-change the number of columns you want to order the rows by, without having to use the lambda expression

sortedlist = sorted(spamreader, key=lambda row:(row['column_1'],row['column_2']), reverse=False)

and specially, without having to add and remove the columns patterns inside the lambda expression, in case in the future you want to order other csv files by a different columns order.

e.g.

items = ('path', 'title')

items = ('id', 'path', 'title', 'date')

items = ('author', 'date', 'title')

instead of

sortedlist = sorted(spamreader, key=lambda row:(row['column_2'],row['column_3']), reverse=False)

sortedlist = sorted(spamreader, key=lambda row:(row['column_1'],row['column_2'],,row['column_3'],row['column_4']), reverse=False)

sortedlist = sorted(spamreader, key=lambda row:(row['column_5'],row['column_4'],row['column_3']), reverse=False)

sort csv by column

6 Answers6

Linked

Related