Replace a value in the dataframe if it falls in between specific values

Question

I have a dataframe which can be constructed as:

df = pd.DataFrame({'A': [1, 4, 6, 3, 2, 3, 6, 8], 
                   'B': [4, 7, 1, 5, 6, 8, 3, 9], 
                   'C': [1, 5, 3, 1, 6, 8, 9, 0], 
                   'D': [6, 3, 7, 8, 9, 4, 2, 1]})

The df looks like:

    A   B   C   D
0   1   4   1   6
1   4   7   5   3
2   6   1   3   7
3   3   5   1   8
4   2   6   6   9
5   3   8   8   4
6   6   3   9   2
7   8   9   0   1

And there are 2 other variables which are to be used in substitution of values in the df:

mx = pd.core.series.Series([7, 8, 8, 7], index=["A", "B", "C", "D"])
dm = pd.core.series.Series([5, 8, 6, 4], index=["A", "B", "C", "D"])

PROBLEM: I want to replace all the values from the dataframe greater than the corresponding value in dm but less than that in mx with the values from dm. In other words, let's say for "D", I want to replace all the values between 4 and 7 with 4.

So the expected output would look something like:

    A   B   C   D
0   1   4   1   4
1   4   7   5   3
2   5   1   3   4
3   3   5   1   8
4   2   6   6   9
5   3   8   6   4
6   5   3   9   2
7   8   9   0   1

I have tried using df.apply and df.update but I'm unable to make the condition. It always throws a ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Is there an efficient way to achieve this? Any help would be appreciated.

jezrael · Accepted Answer · 2021-07-15T13:25:48.413

2

Use DataFrame.mask with compare DataFrame by Series by DataFrame.le and DataFrame.ge, chained mask by & for bitwise AND and replace by Series with parameter axis=1:

df = df.mask(df.ge(dm) & df.le(mx), dm, axis=1)
print (df)
   A  B  C  D
0  1  4  1  4
1  4  7  5  3
2  5  1  3  4
3  3  5  1  8
4  2  6  6  9
5  3  8  6  4
6  5  3  9  2
7  8  9  0  1

edited Jul 15 '21 at 13:25

answered Jul 15 '21 at 13:15

jezrael

822,522
95
1,334
1,252

This is definitely nicer than my way, just notice that according to the expected output you should use `ge` and `le` instead of `gt` and `lt` – Ron Serruya Jul 15 '21 at 13:24

score 0 · Answer 2 · answered Jul 15 '21 at 13:23

I can't tell you if its the best way, and its probably isn't, but this works

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'A': [1, 4, 6, 3, 2, 3, 6, 8],
   ...:                    'B': [4, 7, 1, 5, 6, 8, 3, 9],
   ...:                    'C': [1, 5, 3, 1, 6, 8, 9, 0],
   ...:                    'D': [6, 3, 7, 8, 9, 4, 2, 1]})

In [3]: mx = pd.core.series.Series([7, 8, 8, 7], index=["A", "B", "C", "D"])
   ...: dm = pd.core.series.Series([5, 8, 6, 4], index=["A", "B", "C", "D"])

In [4]: df
Out[4]:
   A  B  C  D
0  1  4  1  6
1  4  7  5  3
2  6  1  3  7
3  3  5  1  8
4  2  6  6  9
5  3  8  8  4
6  6  3  9  2
7  8  9  0  1

In [5]: for col in df.columns:
   ...:     df[col] = df[col].apply(lambda x: x if not dm[col]<=x<=mx[col] else dm[col])
   ...:

In [6]: df
Out[6]:
   A  B  C  D
0  1  4  1  4
1  4  7  5  3
2  5  1  3  4
3  3  5  1  8
4  2  6  6  9
5  3  8  6  4
6  5  3  9  2
7  8  9  0  1

https://stackoverflow.com/questions/54432583/when-should-i-not-want-to-use-pandas-apply-in-my-code/54432584#54432584 — jezrael, Jul 15 '21 at 13:24

Replace a value in the dataframe if it falls in between specific values

2 Answers2