1

python newbie here. I have written the code that solves the issue. However, there should be a much better way of doing it.

I have two Series that come from the same table but due to some earlier process I get as separate sets. (They could be joined into a single dataframe again since the entries belong to the same record)

Ser1                   Ser2                                  
| id |               | section |
| ---|               |-------- |
| 1  |               |    A    |
| 2  |               |    B    |
| 2  |               |    C    |
| 3  |               |    D    |


df2
| id | section |
| ---|---------|
| 1  |    A    |
| 2  |    B    |
| 2  |    Z    |
| 2  |    Y    |
| 4  |    X    |
      

First, I would like to find those entries in Ser1, which match the same id in df2. Then, check if the values in the ser2 can NOT be found in the section column of df2 My expected results:

| id | section |  result |
| ---|-------- |---------|
| 1  |    A    |  False  |    # Both id(1) and section(A) are also in df2
| 2  |    B    |  False  |    # Both id(2) and section(B) are also in df2
| 2  |    C    |  True   |    # id(2) is in df2 but section(C) is not
| 3  |    D    |  False  |    # id(3) is not in df2, in that case the result should also be False


My code:

    for k, v in Ser2.items():
        rslt_df = df2[df2['id'] == Ser[k]]
        if rslt_df.empty:
           print(False)
        if(v not in rslt_df['section'].tolist()):
           print(True)
        else:
           print(False)

I know the code is not very good. But after reading about merging and comprehension lists I am getting confused what the best way would be to improve it.

Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
Nebiros
  • 13
  • 4

1 Answers1

0

You can concat the series and compute the "result" with boolean arithmetic (XOR):

out = (
 pd.concat([ser1, ser2], axis=1)
   .assign(result=ser1.isin(df2['id'])!=ser2.isin(df2['section']))
)

Output:

   id section  result
0   1       A   False
1   2       B   False
2   2       C    True
3   3       D   False

Intermediates:

m1 = ser1.isin(df2['id'])
m2 = ser2.isin(df2['section'])

      m1     m2  m1!=m2
0   True   True   False
1   True   True   False
2   True  False    True
3  False  False   False
mozway
  • 194,879
  • 13
  • 39
  • 75
  • Thanks! It looks much more elegant. However, in this solution the conditions `result=ser1.isin(df2['id'])!=ser2.isin(df2['section'])` are evaluated independent from each other. So the id is searched anywhere on the list and the section as well. Is it possible to evaluate first ser1.isin(df2['id']) and if it is true then evaluate the section only on the rows that were selected in the first condition? If id is not found, then the result would be False and the second condition does not need to be checked. – Nebiros Feb 07 '23 at 23:39
  • While you could do this, it will likely be less efficient and less explicit. Here you take advantage of a vectorized operation where everything is computed fast. You can read [this](https://stackoverflow.com/questions/58104726/why-is-pandas-isin-much-faster-than-in) and to a lesser extent [this](https://stackoverflow.com/questions/50779617/pandas-pd-series-isin-performance-with-set-versus-array) (warning, old pandas!) for some context on `isin` speed and implementation. – mozway Feb 08 '23 at 03:32