3

having an issue with handling oddly formatted excel data and writing to CSV in a string format. In my sample data, the excel table I am importing has a column ('Item_Number') and the odd data in the cell looks like: ="0001", ="00201", 2002AA, 1003B.

When I try to output to csv, the results look like: 1, 201, 2002AA, 1003B. When I try to output to excel, the results are correct: 0001, 00201, 2002AA, 1003B.

All of the dtypes are objects. Am I missing a parameter in my .to_csv() command?

df = pd.read_excel(filename,sheetname='Sheet1', converters= {'Item_Number':str})
df.to_csv('Test_csv.csv')
df.to_excel('Test_excel.xlsx')

Tried different iterations of replacing the "=" and " " " but no response.

df.Item_Number.str.replace('=','')

Currently using the excel output but curious if there is a way to preserve string formatting in CSV. Thanks :)

Arthur D. Howland
  • 4,363
  • 3
  • 21
  • 31
  • 2
    For me it works perfect. How do you test your file? By `df = pd.read_csv('Test_csv.csv')` or by text editor? – jezrael Dec 16 '16 at 14:55
  • I was opening it in excel as a csv. Will test again. – Arthur D. Howland Dec 16 '16 at 15:45
  • Was opening it in excel as a csv. Tried: df = pd.Series([="0001", ="00201", 2002AA, 1003B]).astype(str) but that format won't run. It looks like the read.excel() function can interpret ="0001" from an excel spreadsheet and export correctly, but not to csv. To be clear, ="0001" is in the cell. – Arthur D. Howland Dec 16 '16 at 16:09
  • Further research - This may be an excel issue, not a Python Pandas issue: http://stackoverflow.com/questions/137359/excel-csv-number-cell-format – Arthur D. Howland Dec 16 '16 at 16:31

1 Answers1

3

Opening an excel spreadsheet with Python 3 Pandas that has data that looks like ="0001" will go to the dataframe correctly. CSV will turn it back to "1". Keeping the same format to CSV is apparently a known issue (from my comment above). To keep the formatting I have to add =" " back into the data like this:

df['Item_Number'] = '="' + df['Item_Number'] + '"'

Not sure if there is a cleaner version to that will have an Excel opened CSV file show 0001 without the quotes and equals sign.

Arthur D. Howland
  • 4,363
  • 3
  • 21
  • 31