Datatypes not changing in Pandas DataFrame

Question

I am trying to convert specific columns in my DataFrame to dtype: float. I tried this:

grid[['DISTINCT_COUNT','MAX_COL_LENGTH', 'MIN_COL_LENGTH', 'NULL_COUNT' ]].apply(pd.to_numeric, errors='ignore')

But when I print this afterwards:

print(grid.dtypes)

I am still seeing this:

COLUMN_NM         object
DISTINCT_COUNT    object
NULL_COUNT        object
MAX_COL_VALUE     object
MIN_COL_VALUE     object
MAX_COL_LENGTH    object
MIN_COL_LENGTH    object
TABLE_CNT         object
TABLE_NM          object
DATA_SOURCE       object
dtype: object

Any ideas?

this operation does not modify the dataframe in place. you have to assign the output back to the original. `grid[['DISTINCT_COUNT','MAX_COL_LENGTH', 'MIN_COL_LENGTH', 'NULL_COUNT' ]] = grid[['DISTINCT_COUNT','MAX_COL_LENGTH', 'MIN_COL_LENGTH', 'NULL_COUNT' ]].apply(pd.to_numeric, errors='ignore')` — pault, Jun 08 '18 at 17:21
@pault Even though this is simple, you should post this as an answer so future users can tell that there is a posted solution. — Barker, Jun 08 '18 at 17:23
@pault ah, thank you. I should have known better. That worked perfectly. If you turn this into an answer I can "accept it" if you want — JD2775, Jun 08 '18 at 17:23
@Barker I am searching for a dupe candidate. If I can't find it, I will post an answer. Update: Found it [here](https://stackoverflow.com/a/49986916/5858851). Though the question isn't an exact dupe, the accepted answer solves the problem. — pault, Jun 08 '18 at 17:25
Possible duplicate of [Modify multiple DataFrames by iterating over a list of them](https://stackoverflow.com/questions/49986865/modify-multiple-dataframes-by-iterating-over-a-list-of-them) — pault, Jun 08 '18 at 17:27

score 2 · Accepted Answer · answered Jun 08 '18 at 17:39

Using apply() does not modify the DataFrame in place. You have to assign the output of the operation back to the original DataFrame.

@coldspeed's answer here explains what's going on here:

All these slicing/indexing operations create views/copies of the original dataframe and you then reassign df to these views/copies, meaning the originals are not touched at all.

In your case, you need to do:

columns = ['DISTINCT_COUNT','MAX_COL_LENGTH', 'MIN_COL_LENGTH', 'NULL_COUNT']
grid[columns] = grid[columns].apply(pd.to_numeric, errors='ignore')

Or you could also do:

grid[columns] = pd.to_numeric(grid[columns], errors='ignore')

Datatypes not changing in Pandas DataFrame

1 Answers1