0

I have a csv file which I want to sort by taking each row at a time. While sorting the row, I want to ignore the whitespace (or empty cell). Also, I want to ignore the first row and first column while sorting. This is how my code looks like:

import csv, sys, operator
fname = "Source.csv"
new_fname = "Dest.csv"

data = csv.reader(open(fname,"rb"),delimiter=',') 

num = 1
sortedlist = []
ind=0
for row in data:
    if num==1:
        sortedlist.append(row)
        with open(new_fname,"wb") as f:
            filewriter = csv.writer(f,delimiter=",")
            filewriter.writerow(sortedlist[ind])
            ind+=1
   elif num > 1:            
       sortedlist.append(sorted(row))
       with open(new_fname,"ab") as f:
           filewriter = csv.writer(f,delimiter=",")
           filewriter.writerow(sortedlist[ind])
       ind+=1
       num+=1

I was able to ignore the first row. But, I am not sure how to ignore the whitespace and the first column while sorting. Any suggestions are welcome.

blues
  • 363
  • 1
  • 5
  • 13
  • Could you explain what you mean by "ignore whitespace when sorting"? If you have a row: A, 12, 4, 5, , , 7, 9, , 3 then how should it look after sorting? – user3468054 Sep 25 '15 at 11:41
  • By white space, I mean empty cell (if you open the csv file in ms-excel). For that, it should look like 3,4,5,7,912,A and ignore all , , , , – blues Sep 25 '15 at 11:43
  • @Dennis you might be doing something wrong if you ignore column values like that in CSV. – Rusty Sep 25 '15 at 12:00
  • 1
    Bear in mind that if your data does have numbers in it, you will read them out as strings, and sorted will perform a lexicographical sort (i.e. '132' < '22') – user3468054 Sep 25 '15 at 12:11
  • @user3468054 my data have numbers only in the first column and rest are strings – blues Sep 25 '15 at 12:49

1 Answers1

1

I simplified your code significantly and here's what I got (although I didn't understand the part about empty columns, they are values as well... Did you mean that you wanted to keep empty columns in the same place instead of putting them at start?)

import csv

if __name__ == '__main__':
    reader = csv.reader(open("Source.csv","r"),delimiter=',') 
    out_file = open("Dest.csv","w")
    writer = csv.writer(out_file,delimiter=",")
    writer.writerow(reader.next())

    for row in reader:
        writer.writerow([row[0]] + sorted(row[1:]))

    out_file.close()

Always write executable code in if __name__ == '__main__':, this is done so that your code is not executed if your script was not run directly, but rather imported by another script.

We record the out_file variable to be able out_file.close() it cleanly later, code will work without it, but it's a clean way to write files.

Do not use "wb", "rb", "ab" for text files, the "b" part stands for "binary" and should be used for structured files.

reader.next() gets the first line of the csv file (or crashes if file is empty)

for row in reader: already runs starting from second line (because we ran reader.next() earlier), so we don't need any line number conditionals anymore.

row[0] gets the first element of the list, row[1:] gets all elements of the list, except the first one. For example, row[3:] would ignore first 3 elements and return the rest of the list. In this case, we only sort the row without its first element by doing sorted(row[1:])

EDIT: If you really want to remove empty columns from your csv, replace sorted(row[1:]) with sorted(filter(lambda x: x.strip()!='', row[1:])). This will remove empty columns from the list before sorting, but keep in mind that empty values in csv are still values.

EDIT2: As correctly pointed out by @user3468054 values will be sorted as strings, if you want them to be sorted as numbers, add a named parameter key=int to the sorted function, or key=float if your values are float.

Rusty
  • 904
  • 4
  • 10
  • writer.writerow([row[0]] + sorted(row[1:])) - Beware of empty lines – Rolf of Saxony Sep 25 '15 at 12:27
  • @RolfofSaxony yeah, empty lines will crash the script, I guess it could be written as `writer.writerow(row[:1] + sorted(row[1:]))` instead. – Rusty Sep 25 '15 at 12:33
  • if len(row): writer.writerow([row[0]] + sorted(row[1:])) – Rolf of Saxony Sep 25 '15 at 12:37
  • @Rusty Thanks for the detailed answer. Actually, I was just playing with the sorting part that’s why I didn’t include if __name__ == '__main__' in my program. I want to remove the empty columns because if i don’t, sort function treats it as a string and sorts it as well which i do not want. For example: If i want to sort c,d,z,o, , , k then the result will be , , c,d,k,o,z But I just want result to be c,d,k,o,z Also, in my csv files only the first columns are numbers which i am ignoring and the rest are strings. – blues Sep 25 '15 at 12:43
  • @Dennis in that case, my first edit suits your needs. (the `filter` function) – Rusty Sep 25 '15 at 12:58
  • @Rusty Thanks for the help. – blues Sep 25 '15 at 13:06
  • @Rusty writing writer.writerow(row[:1] + sorted(row[1:])) writes first element as ['1'] while writing writer.writerow(row[0] + sorted(row[1:])) writes first element as just 1 – blues Sep 28 '15 at 08:00
  • @Dennis you missed that `row[0]` was actually in list brackets as follows `[row[0]]`, so, either case will work, but `[row[0]]` has a crash potential. – Rusty Sep 28 '15 at 11:14