Pandas how to not apply to whole column

Question

self.df['Regular Price'] = self.df['Regular Price'].apply(
            lambda x: int(round(x)) if isinstance(
                x, (int, float)) else None
        )

The above code is assigning None to every value of field Regular Price whenever it encounter a non numeric value in the dataframe. I want to assign None to only that cell where its non number value.

thanks

I dont know if I understood your question, but are you looking for `applymap` method? — Ivan, Nov 20 '17 at 11:37
Maybe you are looking for https://stackoverflow.com/questions/47001413/how-to-replace-any-strings-with-nan-in-pandas-dataframe-using-a-boolean-mask/47001627#47001627 — Bharath M Shetty, Nov 20 '17 at 11:45
@Ivan `applymap` works with complete dataframe or it can we applied on one specific column ? — Raheel, Nov 20 '17 at 12:52

jezrael · Accepted Answer · 2017-11-20T14:31:21.090

First is impossible return NaNs with integers, because NaNs is float by design.

Your solution working if mixed types - numeric with strings:

df = pd.DataFrame({
    'Regular Price': ['a',1,2.3,'a',7],
    'B': list(range(5))
})
print (df)
   B Regular Price
0  0             a
1  1             1
2  2           2.3
3  3             a
4  4             7

df['Regular Price'] = df['Regular Price'].apply(
            lambda x: int(round(x)) if isinstance(
                x, (int, float)) else None
        )

print (df)
   B  Regular Price
0  0            NaN
1  1            1.0
2  2            2.0
3  3            NaN
4  4            7.0

But if all data are strings need to_numeric with errors='coerce' for convert not numeric to NaNs:

df = pd.DataFrame({
    'Regular Price': ['a','1','2.3','a','7'],
    'B': list(range(5))
})
print (df)
   B Regular Price
0  0             a
1  1             1
2  2           2.3
3  3             a
4  4             7

df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()
print (df)
   B  Regular Price
0  0            NaN
1  1            1.0
2  2            2.0
3  3            NaN
4  4            7.0

EDIT:

I also need to remove floating points and use int only

It is possible by convert to None for NaNs and cast to int:

df['Regular Price'] = pd.to_numeric(df['Regular Price'],
                                    errors='coerce').round()

df['Regular Price'] = np.where(df['Regular Price'].isnull(), 
                               None,
                               df['Regular Price'].fillna(0).astype(int))

print (df)
   B Regular Price
0  0          None
1  1             1
2  2             2
3  3          None
4  4             7


print (df['Regular Price'].apply(type))
0    <class 'NoneType'>
1         <class 'int'>
2         <class 'int'>
3    <class 'NoneType'>
4         <class 'int'>
Name: Regular Price, dtype: object

But it slow performance, so the best dont use it. There also should be another problems - soe function failed, so the best is floats if working with NaNs:

Testing some function like diff in 50k rows DataFrame:

df = pd.DataFrame({
    'Regular Price': ['a','1','2.3','a','7'],
    'B': list(range(5))
})
df = pd.concat([df]*10000).reset_index(drop=True)
print (df)

df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()

df['Regular Price1'] = np.where(df['Regular Price'].isnull(), 
                               None,
                               df['Regular Price'].fillna(0).astype(int))

In [252]: %timeit df['Regular Price2'] = df['Regular Price1'].diff()

TypeError: unsupported operand type(s) for -: 'int' and 'NoneType'

In [274]: %timeit df['Regular Price3'] = df['Regular Price'].diff()
1000 loops, best of 3: 301 µs per loop

In [272]: %timeit df['Regular Price2'] = df['Regular Price1'] * 1000
100 loops, best of 3: 4.48 ms per loop

In [273]: %timeit df['Regular Price3'] = df['Regular Price'] * 1000
1000 loops, best of 3: 469 µs per loop

EDIT:

df = pd.DataFrame({
    'Regular Price': ['a','1','2.3','a','7'],
    'B': list(range(5))
})
print (df)
   B Regular Price
0  0             a
1  1             1
2  2           2.3
3  3             a
4  4             7

df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()
print (df)
   B  Regular Price
0  0            NaN
1  1            1.0
2  2            2.0
3  3            NaN
4  4            7.0

First is possible remove NaNs rows by column Regular Price and then convert to int.

df1 = df.dropna(subset=['Regular Price']).copy()
df1['Regular Price']  = df1['Regular Price'].astype(int)
print (df1)
   B  Regular Price
1  1              1
2  2              2
4  4              7

Process what you need, but dont change index.

#e.g. some process 
df1['Regular Price']  = df1['Regular Price'] * 100

Last combine_first - it add NaN to Regular Price column.

df2 = df1.combine_first(df)
print (df2)
     B  Regular Price
0  0.0            NaN
1  1.0          100.0
2  2.0          200.0
3  3.0            NaN
4  4.0          700.0

is it necessary to use `Regular Price 1` or can i assign back to `Regular Price` — Raheel, Nov 20 '17 at 11:51
Yes, but it is problem with `NaN`s, because it is `float` :( — jezrael, Nov 20 '17 at 11:54
But sorry if its dumb, my code works with starting rows it gives me int values. But at some point it gets a value which is not `int` or `float`. It should apply `None` to only that cell right ? Why it applies `None` to the whole `Regular Price` column — Raheel, Nov 20 '17 at 11:57
@Raheel Khan Thats the difference between `apply` and `applymap`. Take a look her https://stackoverflow.com/questions/19798153/difference-between-map-applymap-and-apply-methods-in-pandas — Bharath M Shetty, Nov 20 '17 at 11:58
@jezrael I understand your point about using `float`. The problem is the task itself is to make column with zero floating point numbers. So if i use float than it means i can actually remove the whole function :( — Raheel, Nov 20 '17 at 12:19
OK, so if convert to `object` it working nice? It is a bit hack, so rather test carefully. Thanks. — jezrael, Nov 20 '17 at 12:20
OOPS, I test `type` and if convert to `object` still get `floats`. So solution was changed, sorry :( — jezrael, Nov 20 '17 at 12:30
@jezrael so it means i can't use the `to_numeric` solution ? — Raheel, Nov 20 '17 at 12:34
@Bharath I am trying to find any exmaple where `applymap` is used on a specific column. Everywhere it says to apply on the whole dataframe — Raheel, Nov 20 '17 at 12:35
No, it is first step. But there is problem with `integer` with `NaN`s - and it is not native supported, so only hacky solution. Idea - is possible instead `None` use some integer like `-1` or `-1000` ? — jezrael, Nov 20 '17 at 12:36
`""` does the same as `None`. It set all `Regular Price` to `""` — Raheel, Nov 20 '17 at 12:47
Hmmm, I have idea. Is possible filter `NaN`s first out, apply solution working with `int`s only and last add `NaN`s? — jezrael, Nov 20 '17 at 12:56
@jezrael Yes that is okay too. Even dropping `Nan` is awesome — Raheel, Nov 20 '17 at 13:02
Yes, but then lost another rows where was `NaN`s like rows `0,5`. And it should be problem... Or not? — jezrael, Nov 20 '17 at 13:20
I add solution for it, only is necessary by your process data add NaNs, because will be overwritten by `combine_first` — jezrael, Nov 20 '17 at 14:38

Pandas how to not apply to whole column

1 Answers1