2

I am using Pandas to try to find all those Y elements that precede the corresponding X elements in time.

df = {'time':[1,2,3,4,5,6,7,8], 'X':['x','w','r','a','k','y','u','xa'],'Y':['r','xa','a','x','w','u','k','y']}

df = pd.DataFrame.from_dict(df)

time    X   Y
0   1   x   r
1   2   w   xa
2   3   r   a
3   4   a   x
4   5   k   w
5   6   y   u
6   7   u   k
7   8   xa  y

What I would like to achieve is:

time    X   Y
0   1   x   r
1   2   w   xa
2   3   r   a
5   6   y   u

Any ideas?

Ch3steR
  • 20,090
  • 4
  • 28
  • 58
DanYan
  • 23
  • 2

2 Answers2

1

You can make two dictionaries which keep track of the indexes. Then use pd.Series.map to get boolean index then use boolean indexing

idx = dict(zip(df['X'],df['time']))
idx2 = dict(zip(df['Y'],df['time']))
mask = df['Y'].map(lambda k: idx[k]>idx2[k]
df[mask]
   time  X   Y
0     1  x   r
1     2  w  xa
2     3  r   a
5     6  y   u

df.apply over axis 1 is not recommended it should be as your last resort. Check out why

Here's timeit analysis which supports the statement.

In [74]: %%timeit
    ...: df[df.apply(lambda row: row['Y'] in df.loc[row.time:,'X'].values, axis=1)]
    ...:
    ...:
2.26 ms ± 203 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [80]: %%timeit
    ...: idx = dict(zip(df['X'],df['time']))
    ...: idx2 = dict(zip(df['Y'],df['time']))
    ...: mask = df['Y'].map(lambda k: idx[k]>idx2[k])
    ...: x = df[mask]
    ...:
    ...:
498 µs ± 30.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Almost 5X faster.

Ch3steR
  • 20,090
  • 4
  • 28
  • 58
0

Try this:

result = df[df.apply(lambda row: row['Y'] in df.loc[row.time:,'X'].values, axis=1)]

print(result)

   time  X   Y
0     1  x   r
1     2  w  xa
2     3  r   a
5     6  y   u
luigigi
  • 4,146
  • 1
  • 13
  • 30
  • Not saying the answer is bad but `df.apply` over axis 1 should be used only as last resort as it's very inefficient [check out why](https://stackoverflow.com/questions/54432583/when-should-i-ever-want-to-use-pandas-apply-in-my-code) – Ch3steR Jun 11 '20 at 10:08