0

I have a df in Python with an ID column - those IDs can be a mix of numbers and letters, or solely numbers. Eg:

ID
00028D9D1
00027B98F
000275457

When I save this df out, using pandas to_csv I see in the resulting csv file when I share with others (or open up myself), I see the IDs that contain letters are maintained as is / treated as text, but the IDs that are solely numbers are treated as integers, and automatically formatted that way. For example, I would see this in my csv file after saving:

ID
00028D9D1
00027B98F
275457

Is there any way to disable this automatic treatment of integers, leading to different formatting? The dtype of this column does say it is an object so I assumed it would save in the same format for all values.

Wolf
  • 9,679
  • 7
  • 62
  • 108
Liam Hash
  • 109
  • 1
  • 1
  • 4
  • This was answered here: https://stackoverflow.com/questions/23836277/add-leading-zeros-to-strings-in-pandas-dataframe – eatmeimadanish May 18 '22 at 15:25
  • 1
    Does this answer your question? [How to save a CSV from dataframe, to keep zeros left in column with numbers?](https://stackoverflow.com/questions/48903008/how-to-save-a-csv-from-dataframe-to-keep-zeros-left-in-column-with-numbers) – Ynjxsjmh May 18 '22 at 15:26
  • 1
    Are you opening this in Excel? Note that Excel is pretty terrible with CSVs. What do you see in a different text editor such as Notepad(++)? – BigBen May 18 '22 at 15:30
  • I'd expect `object` to work as a placeholder here. – Wolf May 18 '22 at 15:31

1 Answers1

0

According to RFC 4180, CSV files do not contain any type information, so it is solely the responsibility of the application to correctly interpret the contents of the file. From what I read in your question,

I have a df in Python with an ID column - those IDs can be a mix of numbers and letters, or solely numbers.

and as far as I interpret your specification, it you'll have something like this:

input.csv

ID
00028D9D1
00027B98F
000275457

script

import pandas as pd

df = pd.read_csv('input.csv')
print(df)
print(df['ID'].dtype)
df.to_csv('output.csv', index=False)

console output

          ID
0  00028D9D1
1  00027B98F
2  000275457
object

output.csv

ID
00028D9D1
00027B98F
000275457

In other words, use the right tool to "open up" the CSV file you create.

As I observe on Windows, spreadsheet applications like Excel or Open/Libre office register themselves with the .csv file extension, so just opening a CSV will lead to a very generic interpretation of data: cells that can be converted into a number without errors are treated as integer cells, regardless of their column.

One application that lets you view the actual contents of a text file is Windows Notepad, for example, but as a programmer you probably know better alternatives.

Wolf
  • 9,679
  • 7
  • 62
  • 108