-1

I'm trying to import a text file into Python as a dataframe. My text file essentially consists of 2 columns, both of which are numbers.

The problem is: I want one of the columns to be imported as a string (since many of the 'numbers' start with a zero, e.g. 0123, and I will need this column to merge the df with another later on)

My code looks like this:

mydata = pd.read_csv("text_file.txt", sep = "\t", dtype = {"header_col2": str})

However, I still lose the zeros in the output, so a 4-digit number is turned into a 3-digit number.

I'm assuming there is something wrong with my import code but I could not find any solution yet.

I'm new to python/pandas, so any help/suggestions would be much appreciated!

JE_Muc
  • 5,403
  • 2
  • 26
  • 41
Flo P
  • 11
  • 1
  • Convert the data into the format that you want prior to putting it in the dataframe. I would read in the data, manipulate it, and THEN enter it into the dataframe, vs. immediately reading it into the dataframe. – Simeon Ikudabo May 29 '18 at 09:08
  • 1
    [Related question](https://stackoverflow.com/questions/13250046/pandas-csv-import-keep-leading-zeros-in-a-column?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa) – akilat90 May 29 '18 at 09:12
  • Thank you, that solved the problem! – Flo P May 29 '18 at 09:16

1 Answers1

0

Hard to see why your original code not working:

from io import StringIO    
import pandas as pd    

# this mimics your data
mock_txt = StringIO("""header_col2\theader_col3
0123\t5
0333\t10
""")

# same reading as you suggested 
df = pd.read_csv(mock_txt, sep = "\t", dtype = {"header_col2": str})

# are they really strings?
assert isinstance(df.header_col2[0], str)
assert isinstance(df.header_col2[1], str)

P.S. as always at SO - really nice to have some of the data and a minimal working example with code in the original post.

Evgeny
  • 4,173
  • 2
  • 19
  • 39
  • Hi, thanks for the feedback! akilat90's comment above, redirecting to a related question, solved my problem. – Flo P May 29 '18 at 11:08
  • Glad it is solved, but which part of @akilat90's link was playing a role? is it using `dtype` as opposed to `converters`? – Evgeny May 29 '18 at 11:11