1

My code:

import csv
import operator


first_csv_file = open('/Users/jawadmrahman/Downloads/account-cleanup-3 array/example.csv', 'r+')
csv_sort = csv.reader(first_csv_file, delimiter=',')
sort = sorted(csv_sort, key=operator.itemgetter(0))
sorted_csv_file = open('new_sorted2.csv', 'w+', newline='')
write = csv.writer(sorted_csv_file)
for eachline in sort:
    print (eachline)
    write.writerows(eachline)

I have an example csv file: enter image description here

I want to sort by the first column and get the results in this fashion: 1,9 2,17, 3,4 7,10 With the code posted above, this is how I am getting it now: enter image description here

How do I fix this?

  • 1
    Is `,` supposed to represent a decimal point in this context? – Ben Grossmann Jan 07 '22 at 18:33
  • 2
    `pandas` package is the most comprehensive and well supported package for manipulating tabular data such as CSVs. Read, sort, and save should be about 3 lines of code in Pandas. See https://stackoverflow.com/questions/37787698/how-to-sort-pandas-dataframe-from-one-column and https://stackoverflow.com/questions/14365542/import-csv-file-as-a-pandas-dataframe – David Parks Jan 07 '22 at 18:35
  • 2
    `eachline` is itself a list and thus `write.writerows(eachline)` is producing two rows for every `eachline`. Try `write.writerow(eachline)`. While you are at it, I encourage you to look at what the `with` keyword used with `open()` does for you. It will clean up your code substantially. – JonSG Jan 07 '22 at 19:07
  • 1
    Please do not include images of data. Please edit your question and include your input CSV and desired output CSV _as text_. – Zach Young Jan 07 '22 at 19:12
  • @BenGrossmann no. – winterlyrock Jan 11 '22 at 16:06
  • @DavidParks, can't use Pandas, this code will go in Lambda. Pandas is way too big for Lambda, disables the debugging for some reason. – winterlyrock Jan 11 '22 at 16:09
  • 1
    @JonSG, thank you! – winterlyrock Jan 11 '22 at 16:09

1 Answers1

2

As JonSG pointed out in the comments to your original post, you're calling writerows() (plural) on a single row, eachline.

Change that last line to write.writerow(eachline) and you'll be good.

Looking at the problem in depth

writerows() expects "a list of a list of values". The outer list contains the rows, the inner list for each row is effectively the cell (column for that row):

sort = [
  ['1', '9'],
  ['2', '17'],
  ['3', '4'],
  ['7', '10'],
]

writer.writerows(sort)

will produce the sorted CSV with two columns and four rows that you expect (and your print statement shows).

When you call writerows() with a single row:

for eachline in sort:
    writer.writerows(eachline)

you get some really weird output:

  • it interprets eachline at the outer list containing a number of rows, which means...

  • it interprets each item in eachline as a row having individual columns...

  • and each item in eachline is a Python sequence, string, so writerows() iterates over each character in your string, treating each character as its own column...

    ['1','9'] is seen as two single-column rows, ['1'] and ['9']:

    1
    9
    

    ['2', '17'] is seen as the single-column row ['2'] and the double-column row ['1', '7']:

    2
    1,7
    
Zach Young
  • 10,137
  • 4
  • 32
  • 53