So I am using pandas to read in excel files and csv files. These files contain both strings and numbers not just numbers. Problem is all my strings get converted into NaN which I do not want at all. I do not know what the types of the columns will be ahead of time (it is actually my job to handle the system that figures this out) so I can't tell pandas what they will be (that must come later). I just want to read in each cell as a string for now.
here is my code
if csv: #check weather to read in excell file or csv
frame = pandas.read_csv(io.StringIO(data))
else:
frame = pandas.read_excel(io.StringIO(data))
tbl = []
print frame.dtypes
for (i, col) in enumerate(frame):
tmp = [col]
for (j, value) in enumerate(frame[col]):
tmp.append(unicode(value))
tbl.append(tmp)
I just need to be able to produce a column wise 2D list and I can do everything from there. I also need to be able to handle Unicode (data is already in Unicode).
How do I construct 'tbl' so that cells that should be strings do not come out as 'NaN'?