0

I have a dataframe which can be constructed as:

df = pd.DataFrame({'A': [1, 4, 6, 3, 2, 3, 6, 8], 
                   'B': [4, 7, 1, 5, 6, 8, 3, 9], 
                   'C': [1, 5, 3, 1, 6, 8, 9, 0], 
                   'D': [6, 3, 7, 8, 9, 4, 2, 1]})

The df looks like:

    A   B   C   D
0   1   4   1   6
1   4   7   5   3
2   6   1   3   7
3   3   5   1   8
4   2   6   6   9
5   3   8   8   4
6   6   3   9   2
7   8   9   0   1

And there are 2 other variables which are to be used in substitution of values in the df:

mx = pd.core.series.Series([7, 8, 8, 7], index=["A", "B", "C", "D"])
dm = pd.core.series.Series([5, 8, 6, 4], index=["A", "B", "C", "D"])

PROBLEM: I want to replace all the values from the dataframe greater than the corresponding value in dm but less than that in mx with the values from dm. In other words, let's say for "D", I want to replace all the values between 4 and 7 with 4.

So the expected output would look something like:

    A   B   C   D
0   1   4   1   4
1   4   7   5   3
2   5   1   3   4
3   3   5   1   8
4   2   6   6   9
5   3   8   6   4
6   5   3   9   2
7   8   9   0   1

I have tried using df.apply and df.update but I'm unable to make the condition. It always throws a ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Is there an efficient way to achieve this? Any help would be appreciated.

2 Answers2

2

Use DataFrame.mask with compare DataFrame by Series by DataFrame.le and DataFrame.ge, chained mask by & for bitwise AND and replace by Series with parameter axis=1:

df = df.mask(df.ge(dm) & df.le(mx), dm, axis=1)
print (df)
   A  B  C  D
0  1  4  1  4
1  4  7  5  3
2  5  1  3  4
3  3  5  1  8
4  2  6  6  9
5  3  8  6  4
6  5  3  9  2
7  8  9  0  1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This is definitely nicer than my way, just notice that according to the expected output you should use `ge` and `le` instead of `gt` and `lt` – Ron Serruya Jul 15 '21 at 13:24
0

I can't tell you if its the best way, and its probably isn't, but this works

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'A': [1, 4, 6, 3, 2, 3, 6, 8],
   ...:                    'B': [4, 7, 1, 5, 6, 8, 3, 9],
   ...:                    'C': [1, 5, 3, 1, 6, 8, 9, 0],
   ...:                    'D': [6, 3, 7, 8, 9, 4, 2, 1]})

In [3]: mx = pd.core.series.Series([7, 8, 8, 7], index=["A", "B", "C", "D"])
   ...: dm = pd.core.series.Series([5, 8, 6, 4], index=["A", "B", "C", "D"])

In [4]: df
Out[4]:
   A  B  C  D
0  1  4  1  6
1  4  7  5  3
2  6  1  3  7
3  3  5  1  8
4  2  6  6  9
5  3  8  8  4
6  6  3  9  2
7  8  9  0  1

In [5]: for col in df.columns:
   ...:     df[col] = df[col].apply(lambda x: x if not dm[col]<=x<=mx[col] else dm[col])
   ...:

In [6]: df
Out[6]:
   A  B  C  D
0  1  4  1  4
1  4  7  5  3
2  5  1  3  4
3  3  5  1  8
4  2  6  6  9
5  3  8  6  4
6  5  3  9  2
7  8  9  0  1
Ron Serruya
  • 3,988
  • 1
  • 16
  • 26
  • https://stackoverflow.com/questions/54432583/when-should-i-not-want-to-use-pandas-apply-in-my-code/54432584#54432584 – jezrael Jul 15 '21 at 13:24