0

I have a pandas dataframe which has ~40 columns. I need to change the type of one of the columns to float (or numeric) but leave all the other columns unchanged.

All of the examples on this site that I've reviewed on this site either offer ways of converting the whole dataframe to a new type or return a single new column in isolation, neither of which is what I want.

Currently I'm doing this:

df[col] = df[col].astype(float)

but this now yields a setcopywithwarning from Pandas.

How do I change the type of a single column, in place; or copy the dataframe to a new dataframe, changing the type of one column in the process?

Altycoder
  • 270
  • 3
  • 15
  • 4
    What is 2 -3 lines of code above `df[col] = df[col].astype(float)` ? Because this error is confused, `df[col] = df[col].astype(float)` working perfectly. – jezrael Mar 23 '18 at 13:43
  • 1
    Agreed with @jezrael, I cannot reproduce that warning. – Dan Steingart Mar 23 '18 at 13:46
  • that's on the first line of a function into which the df is passed. The function still works (it moves on to other things) so I could ignore but thought best not to. I will try to replicate independantly – Altycoder Mar 23 '18 at 14:07
  • hmm, a test on another dataframe elsewhere doesn't raise any warning. Pandas is definitely complaining about that line though as it's returning that line number. here's the full warning: – Altycoder Mar 23 '18 at 14:11
  • ../outliers\outliers.py:29: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy df[col] = df[col].astype(float) – Altycoder Mar 23 '18 at 14:11
  • You would need to show more around this line: how is `df` defined, how is it handled by the function? – IanS Mar 23 '18 at 15:36
  • @sarkyscouser - Check similar problem, [this](https://stackoverflow.com/q/49069344/2901002), `live['agepreg_rounded'] = live['agepreg'].apply(lambda x: round(x,0))` is correct. – jezrael Mar 23 '18 at 15:46
  • @IanS I don't think so as the warning is returned from the original line posted as you can see in my last comment above, what follows doesn't matter – Altycoder Mar 26 '18 at 06:52
  • @jezrael thanks I will give that a try – Altycoder Mar 26 '18 at 06:53
  • @sarkyscouser - There is reason why want see your code :) – jezrael Mar 26 '18 at 06:53
  • @jezreal it just goes on to calculate some outliers that's all, nothing too fancy ;-) – Altycoder Mar 27 '18 at 08:36

1 Answers1

0

The docstring for pd.DataFrame.astype() includes the following:

Signature: df.astype(dtype, copy=True, errors='raise', **kwargs)
Docstring:
Cast a pandas object to a specified dtype ``dtype``.

Parameters
----------
dtype : data type, or dict of column name -> data type
    Use a numpy.dtype or Python type to cast entire pandas object to
    the same type. Alternatively, use {col: dtype, ...}, where col is a
    column label and dtype is a numpy.dtype or Python type to cast one
    or more of the DataFrame's columns to column-specific types.

So you just need to pass a dict to this method containing the column(s) and the corresponding new dtype you want; example provided below with generic data to change dtype of 2 out of 3 columns.

import numpy as np
import pandas as pd


df = pd.DataFrame(2*np.random.randn(5,3), columns=['a', 'b', 'c'])

# df is shown below
           a       b         c
0   0.104505 2.20864 -0.835571
1  -0.136716 1.94572 -0.640713
2   0.558393 1.47761   3.46805
3    1.57529 1.63724  -2.32679
4 -0.0480981 1.70924   1.79345

df.astype(dict(a=int, c=bool))

# returns the following
   a       b     c
0  0 2.20864  True
1  0 1.94572  True
2  0 1.47761  True
3  1 1.63724  True
4  0 1.70924  True
jeschwar
  • 1,286
  • 7
  • 10
  • that would require me to define all the columns which vary in number and can be up to 40 for some of my dataframes – Altycoder Mar 26 '18 at 06:51
  • So when you type `df.columns` does it return a `RangeIndex`? If so you can convert it from integer representation to string representation with `df.columns = df.columns.astype(str)` and then the above will work like `df.astype({'38': float})` if you want to change column 38 to `float` – jeschwar Mar 28 '18 at 17:57