0

I would like to read in a dataframe using read_csv. For example:

data = pd.read_csv("foo.txt", sep=' ', header=None, dtype={0:np.uint32, 1:np.uint32, 2:np.str})

Except foo.txt has the awkward property that the first two columns are in hex. E.g

ff462 44e44 house

You can convert a hex value to an int with int("ff462", 16). How can I read in the data making sure that the first two columns are converted to dtype uint32?

Simd
  • 19,447
  • 42
  • 136
  • 271
  • You can apply a `converters` parameter, which takes a dict, see a good demo on http://www.gregreda.com/2013/10/26/working-with-pandas-dataframes/ – Jan Aug 04 '16 at 20:58
  • @Jan That looks good. Can you convert hex to uint32 directly? – Simd Aug 04 '16 at 21:08

2 Answers2

2

Apparently this works (cf.here):

data['1'] = data.1.apply(lambda x: int(x,base=0) )
data['1'] = data['1'].astype(np.uint32)
Community
  • 1
  • 1
datahero
  • 101
  • 5
1

You can read in the data as a string and then convert it...

data = pd.read_csv("foo.txt", sep=' ', header=None, dtype=str)
data.iloc[:, [0, 1]] = df.iloc[:, [0, 1]].apply(lambda x: int(x, base=16)).astype(np.uint32)
Alex
  • 18,484
  • 8
  • 60
  • 80