1

I am trying to sort csv file in python.

Here is my CSV file

cat SampleData.csv

OrderDate,Region,Rep,Item,Units,Unit Cost,Total
1/6/14,East,Jones,Pencil,95, 1.99 , 189.05
1/23/14,Central,Kivell,Binder,50, 19.99 , 999.50
2/9/14,"Central","Jardine","Pencil",36, 4.99 , 179.64
2/26/14,Central,Gill,Pen,27, 19.99 , 539.73
3/15/14,West,Sorvino,Pencil,56, 2.99 , 167.44
4/1/14,East,Jones,Binder,60, 4.99 , 299.40
4/18/14,Central,Andrews,Pencil,75, 1.99 , 149.25
5/5/14,Central,Jardine,Pencil,90, 4.99 , 449.10
5/22/14,West,Thompson,Pencil,32, 1.99 , 63.68
6/8/14,East,Jones,Binder,60, 8.99 , 539.40
12/4/15,Central,Jardine,Binder,94, 19.99 ," 1,879.06 "
12/21/15,Central,Andrews,Binder,28, 4.99 , 139.72

Here is my code

import csv
import operator

f = open('SampleData.csv')

csv1 = csv.reader(f, delimiter=',')

sort = sorted(csv1, key=operator.itemgetter(6))

for eachline2 in sort:
        print eachline2

f.close()

here is my result:

['12/4/15', 'Central', 'Jardine', 'Binder', '94', ' 19.99 ', ' 1,879.06 ']
['12/21/15', 'Central', 'Andrews', 'Binder', '28', ' 4.99 ', ' 139.72 ']
['4/18/14', 'Central', 'Andrews', 'Pencil', '75', ' 1.99 ', ' 149.25 ']
['3/15/14', 'West', 'Sorvino', 'Pencil', '56', ' 2.99 ', ' 167.44 ']
['2/9/14', 'Central', 'Jardine', 'Pencil', '36', ' 4.99 ', ' 179.64 ']
['1/6/14', 'East', 'Jones', 'Pencil', '95', ' 1.99 ', ' 189.05 ']
['4/1/14', 'East', 'Jones', 'Binder', '60', ' 4.99 ', ' 299.40 ']
['5/5/14', 'Central', 'Jardine', 'Pencil', '90', ' 4.99 ', ' 449.10 ']
['6/8/14', 'East', 'Jones', 'Binder', '60', ' 8.99 ', ' 539.40 ']
['2/26/14', 'Central', 'Gill', 'Pen', '27', ' 19.99 ', ' 539.73 ']
['5/22/14', 'West', 'Thompson', 'Pencil', '32', ' 1.99 ', ' 63.68 ']
['1/23/14', 'Central', 'Kivell', 'Binder', '50', ' 19.99 ', ' 999.50 ']
['OrderDate', 'Region', 'Rep', 'Item', 'Units', 'Unit Cost', 'Total']

I am not sure what wrong I am doing here.

I have two issues here,

  1. Sorting is not happening as you see.
  2. I am getting header in the last line. I want it in the first line.

Any help greatly appreciated.

user3330284
  • 373
  • 2
  • 6
  • 14
  • The data is sorted just fine. Take a look at the output of `sorted(['Total', ' 1,879.06 ', ' 999.50'])` . If you are going to be doing a lot of work with csv files you may want to check out http://pandas.pydata.org/ – YXD Apr 12 '15 at 18:19

1 Answers1

2

Actually, sorting is completely correct because you are comparing strings (which happens on a character-by-character basis), not numbers. A string ' 1,879.06 ' comes before ' 139.72 ' because it is lexicographically smaller.

If you want to sort the rows based on the number value of the last column, either convert the last column to floats or change the function you pass as key. Try the following:

sort = sorted(csv1, key=lambda t: float(t[6]))

However, this will raise a ValueError because column header can't be converted to a number.

To make the header appear at the beginning, you can try to convert the reader object csv1 to a list and the only sort a slice of the list, or change the key function to make sure that strings that don't start with a digit get converted to 0, which should place them before other rows (if all other rows have values greater than 0).

Potential solution might be something like this:

import csv
import operator

def key_fn(row):
    # Your strings contain both commas and dots, but commas can be removed
    return float(row[6].replace(',', ''))

f = open('SampleData.csv')

csv1 = csv.reader(f, delimiter=',')
header = next(csv1) # because csv reader supports the iterator protocol
rows = sorted(csv1, key=key_fn)

print header

for row in rows:
    print row

f.close()

For potentially better handling of commas in numbers, look at this question.

Community
  • 1
  • 1
hgazibara
  • 1,832
  • 1
  • 19
  • 22