1

I am using pandas to read and write excel files with python (xlsx files, using openpyxl). Part of my data is text that looks like numbers. For instance, in the original excel file, I have a cell

123456789012345

and it imports fine as text in python. However, if I write the excel file again, the output cell has format 'General' and shows as

1.23e14

So my question is: Is there any way to set the number format in pandas? I would like to set it to '@', i.e., to openpyxl.styles.NumberFormat.FORMAT_TEXT.

Unfortunately, the documentation of pandas.ExcelWriter is not so advanced (i.e., it does not exist).

Thank you for your help.

Gurfuffle
  • 784
  • 12
  • 32
Felix
  • 659
  • 2
  • 10
  • 24

1 Answers1

4

I think in the worst case you can manipulate cells directly. I don't have Excel installed but maybe you can check if this works.

In [67]: df=pd.DataFrame ([123456789012345])

In [68]: writer = pd.ExcelWriter ('e.xlsx', engine='openpyxl')

In [69]: df.to_excel (writer, 'sheet1')

In [70]: ws = writer.sheets['sheet1']

In [71]: ws['A1'].style.number_format.format_code='@'

In [72]: writer.save ()

In [73]: pd.read_excel ('e.xlsx')
Out[73]: 
                 0
0  123456789012345
Happy001
  • 6,103
  • 2
  • 23
  • 16
  • Unfortunately, this does not fully solve the problem. The format is text, but it still shows as 1.23e14. The reason is that, in the unzipped xlsx, strings are stored in xl/sharedStrings.xml, whereas numbers are stored in xl/worksheets/sheet1.xml. And with this solution, the text number is stored in xl/worksheets/sheet1.xml. – Felix Jul 01 '14 at 07:34
  • I just figured out that your solution works, provided one uses a string in the df construction: `df=pd.DataFrame (['123456789012345'])`. However, the read function ignores the data type. I will post a separate question on that. – Felix Jul 01 '14 at 10:49