0

This piece reads a csv file, creates a list and converts the list to a numpy.array:

with open ('infile.csv', 'r') as infile:
    reader = csv.reader(infile)
    reader_list = list(reader)
    reader_array = array(reader_list)

This 2d array shape is:

print reader_array.shape
(2938, 6)

When some other data is added to the csv file (say another 2000 rows by 6 columns) array becomes 1d, probably because of uneven shape. But if I open this csv file, press ctrl+s and accept the Excel incompatible format warning to save the csv, and run the code again it works!

print reader_array.shape
(2938, 12)

I understand that opening and quick saving the csv file changes its format because the file size gets smaller than the original, but cannot figure out how do they differ. The code that creates this csv file is like:

with open ('outfile2.csv', 'wb') as outfile:
    writer = csv.writer(outfile)
    .
    .
    .
    data = loadtxt(fname_...)
    .
    .
    .
    list_.append(sublist_)

for row in izip_longest(*averages_, fillvalue = ['']):
    writer.writerow(list(chain.from_iterable(row)))

output written to csv file is like this:

['1689.000000', '0.000954', '0.007900', '0.017542', '0.057176', 94.164925128317591, '1689.000000', '0.001107', '0.007444', '0.018361', '0.059156', 94.151092414521969]
['1690.000000', '0.001025', '0.007925', '0.018905', '0.060608', 94.165950129377109, '1690.000000', '0.001316', '0.007463', '0.017517', '0.058879', 94.152408118013895]
['1691.000000', '0.001124', '0.008067', '0.017934', '0.058068', 94.167074126395363, '1691.000000', '0.001226', '0.007473', '0.016914', '0.057320', 94.153634253740464]

can someone please explain what happens here? Can I change the format of csv when it's being written such that it works? Even csv.writer(outfile, dialect='excel') didn't help.

PyLearner
  • 239
  • 2
  • 5
  • 11

1 Answers1

0

I am not sure but maybe you can try read binary:

with open ('infile.csv', 'rb') as infile:

Because I know there is some difference dealing with linebreaking between 'r' and 'rb', saving the file in Excel might change some representation of linebreaking.

Also, maybe you can share a sample version of you input file to illustrate.

Ray
  • 2,472
  • 18
  • 22
  • Thanks but didn't make any change :( – PyLearner Dec 05 '13 at 10:34
  • @PyLearner Do you mind putting your code somewhere like [pastebin](http://http://pastebin.com) so that I can try something? – Ray Dec 05 '13 at 10:45
  • had not heard of it before! :) here's the like [http://pastebin.com/YvFp0RKL]. I think you should paste it to your address bar. – PyLearner Dec 05 '13 at 11:24
  • btw, please see input files in my other q here: http://stackoverflow.com/questions/20339934/python-how-to-write-data-in-new-columns-for-every-file-read-by-numpy – PyLearner Dec 05 '13 at 11:33
  • @PyLearner I am not sure whether I get what you mean. Is this intuitive way of output you want? `for row in averages_: for r in row: writer.writerow(r)` – Ray Dec 05 '13 at 11:53
  • well, the list averages_ contains several lists of lists and the method you suggested was the one I tried firstly but doesn't give the result I want. please see http://stackoverflow.com/questions/20368047/write-several-list-of-lists-in-a-csv-columns-in-python – PyLearner Dec 05 '13 at 12:23
  • @PyLearner So you want every row of the output csv preserves one row of output as one list in `averages_`? Right now your output is like this `['1689.000000', '0.000954', '0.007900', '0.017542', '0.057176', 94.164925128317591, '1689.000000', '0.001107', '0.007444', '0.018361', '0.059156', 94.151092414521969]` in the file? What I know for `csv.writer` is that it requires every cell in the csv to be a `list`. – Ray Dec 05 '13 at 12:28
  • In the file each number is printed in a separate cell. With the same code, when there's only 1 input text file, or text files have equal number of rows it works fine. That's why I suspect output csv file could be written in an alternative format. – PyLearner Dec 05 '13 at 12:43