0

So I am using pandas to read in excel files and csv files. These files contain both strings and numbers not just numbers. Problem is all my strings get converted into NaN which I do not want at all. I do not know what the types of the columns will be ahead of time (it is actually my job to handle the system that figures this out) so I can't tell pandas what they will be (that must come later). I just want to read in each cell as a string for now.

here is my code

if csv: #check weather to read in excell file or csv
  frame = pandas.read_csv(io.StringIO(data))
else:
  frame = pandas.read_excel(io.StringIO(data))
tbl = []
print frame.dtypes
for (i, col) in enumerate(frame):
  tmp = [col]
  for (j, value) in enumerate(frame[col]):
    tmp.append(unicode(value))
  tbl.append(tmp)

I just need to be able to produce a column wise 2D list and I can do everything from there. I also need to be able to handle Unicode (data is already in Unicode).

How do I construct 'tbl' so that cells that should be strings do not come out as 'NaN'?

Jake
  • 747
  • 1
  • 5
  • 19
  • 1
    Is the problem occurring with CSV files or Excel files? Add a sample file to the question so we can reproduce the problem. – Warren Weckesser Jul 11 '14 at 18:09
  • Did you read documentation [parsers.read_csv](http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html) ? Did you try to use it - make some experiments with arguments ? – furas Jul 11 '14 at 18:40
  • Yes, I did. That is how I found the function. I did experiment with it that is how I found this issue. – Jake Jul 11 '14 at 19:06
  • To clarify I can't use dtype because I do not know what the header names will be until I read in the file. – Jake Jul 11 '14 at 19:36

1 Answers1

1

In general cases where you can't know the dtypes or column names of a CSV ahead of time, using a CSV sniffer can be helpful.

import csv
[...] 
dialect = csv.Sniffer().sniff(f.read(1024))
f.seek(0)

frame = pandas.read_csv(io.StringIO(data), dialect=dialect)
szxk
  • 1,769
  • 18
  • 35
  • I have to be able to use unicode so I can't use python csv (I am using python 2.7). but close!! I could certainly make use of a Unicode version of that – Jake Jul 11 '14 at 20:04
  • Haven't tried this, but looks promising: http://stackoverflow.com/a/10275281/2907617 – szxk Jul 11 '14 at 20:15