0

I am trying to convert specific columns in my DataFrame to dtype: float. I tried this:

grid[['DISTINCT_COUNT','MAX_COL_LENGTH', 'MIN_COL_LENGTH', 'NULL_COUNT' ]].apply(pd.to_numeric, errors='ignore')

But when I print this afterwards:

print(grid.dtypes)

I am still seeing this:

COLUMN_NM         object
DISTINCT_COUNT    object
NULL_COUNT        object
MAX_COL_VALUE     object
MIN_COL_VALUE     object
MAX_COL_LENGTH    object
MIN_COL_LENGTH    object
TABLE_CNT         object
TABLE_NM          object
DATA_SOURCE       object
dtype: object

Any ideas?

JD2775
  • 3,658
  • 7
  • 30
  • 52
  • 5
    this operation does not modify the dataframe in place. you have to assign the output back to the original. `grid[['DISTINCT_COUNT','MAX_COL_LENGTH', 'MIN_COL_LENGTH', 'NULL_COUNT' ]] = grid[['DISTINCT_COUNT','MAX_COL_LENGTH', 'MIN_COL_LENGTH', 'NULL_COUNT' ]].apply(pd.to_numeric, errors='ignore')` – pault Jun 08 '18 at 17:21
  • 1
    @pault Even though this is simple, you should post this as an answer so future users can tell that there is a posted solution. – Barker Jun 08 '18 at 17:23
  • @pault ah, thank you. I should have known better. That worked perfectly. If you turn this into an answer I can "accept it" if you want – JD2775 Jun 08 '18 at 17:23
  • @Barker I am searching for a dupe candidate. If I can't find it, I will post an answer. Update: Found it [here](https://stackoverflow.com/a/49986916/5858851). Though the question isn't an exact dupe, the accepted answer solves the problem. – pault Jun 08 '18 at 17:25
  • 1
    Possible duplicate of [Modify multiple DataFrames by iterating over a list of them](https://stackoverflow.com/questions/49986865/modify-multiple-dataframes-by-iterating-over-a-list-of-them) – pault Jun 08 '18 at 17:27

1 Answers1

2

Using apply() does not modify the DataFrame in place. You have to assign the output of the operation back to the original DataFrame.

@coldspeed's answer here explains what's going on here:

All these slicing/indexing operations create views/copies of the original dataframe and you then reassign df to these views/copies, meaning the originals are not touched at all.

In your case, you need to do:

columns = ['DISTINCT_COUNT','MAX_COL_LENGTH', 'MIN_COL_LENGTH', 'NULL_COUNT']
grid[columns] = grid[columns].apply(pd.to_numeric, errors='ignore')

Or you could also do:

grid[columns] = pd.to_numeric(grid[columns], errors='ignore')
pault
  • 41,343
  • 15
  • 107
  • 149