-2
import pandas as pd
check = pd.read_csv('1.csv')
nocheck = check['CUSIP'].str[:-1]
nocheck = nocheck.to_frame()
nocheck['CUSIP'] = nocheck['CUSIP'].astype(str)
nocheck.to_csv('NoCheck.csv')

This works but while writing the csv, a value for an identifier like 0003418 (type = str) converts to 3418 (type = general) when the csv file is opened in Excel. How do I avoid this?

cheech
  • 23
  • 4
  • When reading or when writing? I don't think this would happen when writing if the columns was str to begin with? – cs95 Dec 29 '17 at 16:36
  • 1
    The value is becoming a number *where*? In Pandas? In _insertCSVreadingprogramhere_? What are you using to make the discovery that the data is changing? If you're using Excel to open the CSV then it's Excel's fault, not Python's. – sco1 Dec 29 '17 at 16:37
  • Building on @excaza's comment- if you're using Excel, it may autoformat your numerical columns. You can either change the column format to text OR you can force quoting on write using `nocheck.to_csv('NoCheck.csv', quoting=csv.QUOTE_ALL)` – pault Dec 29 '17 at 16:43
  • Possible duplicate of [Best way to format integer as string with leading zeros?](https://stackoverflow.com/questions/733454/best-way-to-format-integer-as-string-with-leading-zeros) – trejas Dec 29 '17 at 16:45
  • Yes the change is evident only in excel, but how could I avoid this? – cheech Dec 29 '17 at 18:51
  • @pault i get an error: --------------------------------------------------------------------------- NameError Traceback (most recent call last) in () ----> 1 nocheck.to_csv('NoCheck.csv', quoting = csv.QUOTE_ALL) NameError: name 'csv' is not defined – cheech Dec 29 '17 at 18:55
  • you need to `import csv` first – pault Dec 29 '17 at 18:56

1 Answers1

-1

I couldn't find a dupe for this question, so I'll post my comment as a solution.

This is an Excel issue, not a python error. Excel autoformats numeric columns to remove leading 0's. You can "fix" this by forcing pandas to quote when writing:

import csv
# insert pandas code from question here
# use csv.QUOTE_ALL when writing CSV.
nocheck.to_csv('NoCheck.csv', quoting=csv.QUOTE_ALL)

Note that this will actually put quotes around each value in your CSV. It will render the way you want in Excel, but you may run into issues if you try to read the file some other way.

Another solution is to write the CSV without quoting, and change the cell format in Excel to "General" instead of "Numeric".

pault
  • 41,343
  • 15
  • 107
  • 149