2

I have a dataframe with two columns; source, and target. I would like to detect inverse rows, i.e. for a pair of values (source, target), if there exists a pair of values (target, source) then assign True to a new column.

My attempt:

cols = ['source', 'target']
_cols = ['target', 'source']
sub_edges = edges[cols]
sub_edges['oneway'] = sub_edges.apply(lambda x: True if x[x.isin(x[_cols])] else False, axis=1)
CDJB
  • 14,043
  • 5
  • 29
  • 55
aba2s
  • 452
  • 2
  • 18
  • Please include a small subset of your sub_edges DataFrame as a __copyable__ piece of code that can be used for testing as well as your expected or desired output. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888). – Henry Ecker May 08 '21 at 14:04

2 Answers2

1

You can apply a lambda function using similar logic to that in your example. We check if there are any rows in the dataframe with a reversed source/target pair.

Incidentally, the column name 'oneway' indicates to me the opposite of the logic described in your question, but to change this we can just remove the not in the lambda function.

Code

import pandas as pd
import random

edges = {"source": random.sample(range(20), 20),
         "target": random.sample(range(20), 20)}

df = pd.DataFrame(edges)

df["oneway"] = df.apply(
    lambda x: not df[
        (df["source"] == x["target"]) & (df["target"] == x["source"]) & (df.index != x.name)
    ].empty,
    axis=1,
)

Output

    source  target  oneway
0        9      11   False
1       16       1    True
2        1      16    True
3       11      14   False
4        4      13   False
5       18      15   False
6       14      17   False
7       13      12   False
8       19      19   False
9       12       3   False
10      10       6   False
11      15       5   False
12       3      18   False
13      17       0   False
14       6       7   False
15       5      10   False
16       7       2   False
17       8       9   False
18       0       4   False
19       2       8   False
CDJB
  • 14,043
  • 5
  • 29
  • 55
  • Lots of thanks. That what I was looking for – aba2s May 08 '21 at 14:49
  • I noticed an error the row number 8 shouldn't be true but false. I found that I got more True value than expected. How can I correct it? – aba2s May 08 '21 at 15:34
  • @aba2s row 8 is True because there is an edge from node 19 to itself. Is that not expected behaviour? – CDJB May 08 '21 at 15:36
  • It is not what expected. Here I only want if given tuple (u,v) and if there exist a tuple (v,u) then give it true else false. So line number 8 doesn't have another inverse line. legend : u=origin and v = target – aba2s May 08 '21 at 15:41
  • @aba2s But if (19, 19) exists then surely (19, 19) also exists :P. Either way, I've altered the function to return False for these cases. – CDJB May 08 '21 at 15:49
0

Another, slightly faster method, using pandas pivot and melt.

d = df.pivot(
    index='source', 
    columns='target'
).fillna(0)

index = d.index.union(d.columns)
d = d.reindex(index=index, columns=index, fill_value=0)


e = (
    (d + d.T)
    .rename_axis('source', axis=0)
    .rename_axis('target', axis=1)
).reset_index().melt('source')

e[e.value == 2].sort_values(by=['source', 'target'])

This will show you the first order loops in your edgelist.

Divyanshu Srivastava
  • 1,379
  • 11
  • 24