2
self.df['Regular Price'] = self.df['Regular Price'].apply(
            lambda x: int(round(x)) if isinstance(
                x, (int, float)) else None
        )

The above code is assigning None to every value of field Regular Price whenever it encounter a non numeric value in the dataframe. I want to assign None to only that cell where its non number value.

thanks

Raheel
  • 8,716
  • 9
  • 60
  • 102
  • 1
    I dont know if I understood your question, but are you looking for `applymap` method? – Ivan Nov 20 '17 at 11:37
  • Maybe you are looking for https://stackoverflow.com/questions/47001413/how-to-replace-any-strings-with-nan-in-pandas-dataframe-using-a-boolean-mask/47001627#47001627 – Bharath M Shetty Nov 20 '17 at 11:45
  • @Ivan `applymap` works with complete dataframe or it can we applied on one specific column ? – Raheel Nov 20 '17 at 12:52

1 Answers1

1

First is impossible return NaNs with integers, because NaNs is float by design.

Your solution working if mixed types - numeric with strings:

df = pd.DataFrame({
    'Regular Price': ['a',1,2.3,'a',7],
    'B': list(range(5))
})
print (df)
   B Regular Price
0  0             a
1  1             1
2  2           2.3
3  3             a
4  4             7

df['Regular Price'] = df['Regular Price'].apply(
            lambda x: int(round(x)) if isinstance(
                x, (int, float)) else None
        )

print (df)
   B  Regular Price
0  0            NaN
1  1            1.0
2  2            2.0
3  3            NaN
4  4            7.0

But if all data are strings need to_numeric with errors='coerce' for convert not numeric to NaNs:

df = pd.DataFrame({
    'Regular Price': ['a','1','2.3','a','7'],
    'B': list(range(5))
})
print (df)
   B Regular Price
0  0             a
1  1             1
2  2           2.3
3  3             a
4  4             7

df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()
print (df)
   B  Regular Price
0  0            NaN
1  1            1.0
2  2            2.0
3  3            NaN
4  4            7.0

EDIT:

I also need to remove floating points and use int only

It is possible by convert to None for NaNs and cast to int:

df['Regular Price'] = pd.to_numeric(df['Regular Price'],
                                    errors='coerce').round()

df['Regular Price'] = np.where(df['Regular Price'].isnull(), 
                               None,
                               df['Regular Price'].fillna(0).astype(int))

print (df)
   B Regular Price
0  0          None
1  1             1
2  2             2
3  3          None
4  4             7


print (df['Regular Price'].apply(type))
0    <class 'NoneType'>
1         <class 'int'>
2         <class 'int'>
3    <class 'NoneType'>
4         <class 'int'>
Name: Regular Price, dtype: object

But it slow performance, so the best dont use it. There also should be another problems - soe function failed, so the best is floats if working with NaNs:

Testing some function like diff in 50k rows DataFrame:

df = pd.DataFrame({
    'Regular Price': ['a','1','2.3','a','7'],
    'B': list(range(5))
})
df = pd.concat([df]*10000).reset_index(drop=True)
print (df)

df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()

df['Regular Price1'] = np.where(df['Regular Price'].isnull(), 
                               None,
                               df['Regular Price'].fillna(0).astype(int))

In [252]: %timeit df['Regular Price2'] = df['Regular Price1'].diff()

TypeError: unsupported operand type(s) for -: 'int' and 'NoneType'

In [274]: %timeit df['Regular Price3'] = df['Regular Price'].diff()
1000 loops, best of 3: 301 µs per loop

In [272]: %timeit df['Regular Price2'] = df['Regular Price1'] * 1000
100 loops, best of 3: 4.48 ms per loop

In [273]: %timeit df['Regular Price3'] = df['Regular Price'] * 1000
1000 loops, best of 3: 469 µs per loop

EDIT:

df = pd.DataFrame({
    'Regular Price': ['a','1','2.3','a','7'],
    'B': list(range(5))
})
print (df)
   B Regular Price
0  0             a
1  1             1
2  2           2.3
3  3             a
4  4             7

df['Regular Price'] = pd.to_numeric(df['Regular Price'], errors='coerce').round()
print (df)
   B  Regular Price
0  0            NaN
1  1            1.0
2  2            2.0
3  3            NaN
4  4            7.0

First is possible remove NaNs rows by column Regular Price and then convert to int.

df1 = df.dropna(subset=['Regular Price']).copy()
df1['Regular Price']  = df1['Regular Price'].astype(int)
print (df1)
   B  Regular Price
1  1              1
2  2              2
4  4              7

Process what you need, but dont change index.

#e.g. some process 
df1['Regular Price']  = df1['Regular Price'] * 100

Last combine_first - it add NaN to Regular Price column.

df2 = df1.combine_first(df)
print (df2)
     B  Regular Price
0  0.0            NaN
1  1.0          100.0
2  2.0          200.0
3  3.0            NaN
4  4.0          700.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • is it necessary to use `Regular Price 1` or can i assign back to `Regular Price` – Raheel Nov 20 '17 at 11:51
  • Yes, sure, I change it. – jezrael Nov 20 '17 at 11:53
  • Yes, but it is problem with `NaN`s, because it is `float` :( – jezrael Nov 20 '17 at 11:54
  • But sorry if its dumb, my code works with starting rows it gives me int values. But at some point it gets a value which is not `int` or `float`. It should apply `None` to only that cell right ? Why it applies `None` to the whole `Regular Price` column – Raheel Nov 20 '17 at 11:57
  • 1
    @Raheel Khan Thats the difference between `apply` and `applymap`. Take a look her https://stackoverflow.com/questions/19798153/difference-between-map-applymap-and-apply-methods-in-pandas – Bharath M Shetty Nov 20 '17 at 11:58
  • @jezrael I understand your point about using `float`. The problem is the task itself is to make column with zero floating point numbers. So if i use float than it means i can actually remove the whole function :( – Raheel Nov 20 '17 at 12:19
  • 1
    OK, so if convert to `object` it working nice? It is a bit hack, so rather test carefully. Thanks. – jezrael Nov 20 '17 at 12:20
  • OOPS, I test `type` and if convert to `object` still get `floats`. So solution was changed, sorry :( – jezrael Nov 20 '17 at 12:30
  • @jezrael so it means i can't use the `to_numeric` solution ? – Raheel Nov 20 '17 at 12:34
  • @Bharath I am trying to find any exmaple where `applymap` is used on a specific column. Everywhere it says to apply on the whole dataframe – Raheel Nov 20 '17 at 12:35
  • 1
    No, it is first step. But there is problem with `integer` with `NaN`s - and it is not native supported, so only hacky solution. Idea - is possible instead `None` use some integer like `-1` or `-1000` ? – jezrael Nov 20 '17 at 12:36
  • 1
    The best is test it. Change `None` to `""` in `np.where`. – jezrael Nov 20 '17 at 12:44
  • `""` does the same as `None`. It set all `Regular Price` to `""` – Raheel Nov 20 '17 at 12:47
  • 1
    Hmmm, I have idea. Is possible filter `NaN`s first out, apply solution working with `int`s only and last add `NaN`s? – jezrael Nov 20 '17 at 12:56
  • @jezrael Yes that is okay too. Even dropping `Nan` is awesome – Raheel Nov 20 '17 at 13:02
  • Yes, but then lost another rows where was `NaN`s like rows `0,5`. And it should be problem... Or not? – jezrael Nov 20 '17 at 13:20
  • I add solution for it, only is necessary by your process data add NaNs, because will be overwritten by `combine_first` – jezrael Nov 20 '17 at 14:38
  • @jezrael Thanks alot :) – Raheel Nov 20 '17 at 16:22