0

I have a simple pandas dataframe with a column:

col = [['A']]
data = [[1.0],[2.3],[3.4]]
df = pd.DataFrame.from_records(data, columns=col)

This creates a dataframe with one column of type np.float64, which is what I want.

Later in the process, I want to add another column of type string.

df['SOMETEXT'] = "SOME TEXT FOR ANALYSIS"

The dtype of this column is coming though as dtype of object, but I need it to be type string. So I do the following:

df['SOMETEXT'] = df['SOMETEXT'].astype(str)

If I look at the dtype again, I get the same dtype for that column: object.

I have a process down my workflow that is dtype sensitive and I need the column to be a string.

Any ideas?

array = df.to_records(index=False) # convert to numpy array

The dtypes on the array still carry the object dtype, but the columns should be a string.

chrisaycock
  • 36,470
  • 14
  • 88
  • 125
code base 5000
  • 3,812
  • 13
  • 44
  • 73
  • Possible duplicate of [How to convert column with dtype as object to string in Pandas Dataframe](http://stackoverflow.com/questions/33957720/how-to-convert-column-with-dtype-as-object-to-string-in-pandas-dataframe) – languitar Mar 14 '17 at 12:38
  • @languitar the solution provided did not work. – code base 5000 Mar 14 '17 at 14:28

1 Answers1

3

In pandas, all strings are object type. It confused me too when I first started.

Once in NumPy, you can cast the string:

In [24]: array['SOMETEXT'].astype(str)
Out[24]: 
array(['SOME TEXT FOR ANALYSIS', 'SOME TEXT FOR ANALYSIS',
       'SOME TEXT FOR ANALYSIS'], 
      dtype='<U22')
chrisaycock
  • 36,470
  • 14
  • 88
  • 125