Read in a dataframe and convert some columns

Question

I would like to read in a dataframe using read_csv. For example:

data = pd.read_csv("foo.txt", sep=' ', header=None, dtype={0:np.uint32, 1:np.uint32, 2:np.str})

Except foo.txt has the awkward property that the first two columns are in hex. E.g

ff462 44e44 house

You can convert a hex value to an int with int("ff462", 16). How can I read in the data making sure that the first two columns are converted to dtype uint32?

You can apply a `converters` parameter, which takes a dict, see a good demo on http://www.gregreda.com/2013/10/26/working-with-pandas-dataframes/ — Jan, Aug 04 '16 at 20:58
@Jan That looks good. Can you convert hex to uint32 directly? — Simd, Aug 04 '16 at 21:08

score 2 · Answer 1 · edited May 23 '17 at 10:28

2

Apparently this works (cf.here):

data['1'] = data.1.apply(lambda x: int(x,base=0) )
data['1'] = data['1'].astype(np.uint32)

edited May 23 '17 at 10:28

Community

1
1

answered Aug 04 '16 at 21:02

datahero

101
5

Will that make it uint32? – Simd Aug 04 '16 at 21:06

Alex · Accepted Answer · 2016-08-04T21:08:50.000

1

You can read in the data as a string and then convert it...

data = pd.read_csv("foo.txt", sep=' ', header=None, dtype=str)
data.iloc[:, [0, 1]] = df.iloc[:, [0, 1]].apply(lambda x: int(x, base=16)).astype(np.uint32)

edited Aug 04 '16 at 21:08

answered Aug 04 '16 at 21:02

Alex

18,484
8
60
80

Will that make it uint32? – Simd Aug 04 '16 at 21:06
1

Updated with a type cast – Alex Aug 04 '16 at 21:08
Thank you. As the file is very large I would ideally prefer to convert it as it is read but I will try your solution too. – Simd Aug 04 '16 at 21:09

Read in a dataframe and convert some columns

2 Answers2