10

I have a csv file that has a few columns which are numbers and few that are string. When I try myDF.dtypes it shows me all the string columns as object.

  1. Someone asked a related question before here about why this is done. Is it possible to recast the dtype from object to string?

  2. Also, in general, is there any easy way to recast the dtype from int64 and float64 to int32 and float32 and save on the size of the data (in memory / on disk)?

Community
  • 1
  • 1
uday
  • 6,453
  • 13
  • 56
  • 94

2 Answers2

3

All strings are represented as variable-length (which is what object dtype is holding). You can do series.astype('S32') if you want; but it will be recast if you then store it in a DataFrame or do much with it. This is for simplicity.

Certain serialization formats, e.g. HDFStore stores the strings as fixed-length strings on disk though.

You can series.astype(int32) if you would like and it will store as the new type.

Jeff
  • 125,376
  • 21
  • 220
  • 187
  • Just to be clear you recommend changing *each* series in a data frame before saving it via HDFstore and vice versa when loading? – uday Feb 18 '14 at 01:04
  • you can create a pandas object of a specific dtype if u want or astype later. not sure what your goal is; why do you care what string dtype it actually is; why is object a problem? – Jeff Feb 18 '14 at 01:33
  • @Jeff I have written a function, it's giving null as output because it doesn't see str fromat it sees object format what should I do? –  Feb 17 '16 at 04:28
0
df = your dataframe object with values
print('dtype in object form :')
print(df.dtypes[df.columns[0]])    // output: dtype('O')
print('\ndtype in string')
print(str(df.dtypes[df.columns[0]]))    // output: 'object'
Anshul Bisht
  • 1,644
  • 20
  • 21