-1

I have a dataset with two columns

    id  to
0   1   0x954b890704693af242613edef1b603825afcd708
1   1   0x954b890704693af242613edef1b603825afcd708
2   1   0x607f4c5bb672230e8672085532f7e901544a7375
3   1   0x9b9647431632af44be02ddd22477ed94d14aacaa
4   2   0x9b9647431632af44be02ddd22477ed94d14aacaa

and I would like to print the value in column 'to' that is present in different levels of the column 'id', in the above example for example the only value to be printed should be 0x9b9647431632af44be02ddd22477ed94d14aacaa

I have done this with a for loop within, i wonder it there is a better way of doing this:

for index, row in df.iterrows():
  to=row['to']
  id=row['id']
  for index, row in df.iterrows():
    if row['to']==to and row['id']!=id:
      print(to)
  • If your objective is to take only the `'to'` which have > 1 `'id'`, then simply group by 'to' and use the `'nunique'` function on `'id'`. – BloomShell Aug 27 '22 at 10:08
  • Does this answer your question? [Pandas 'count(distinct)' equivalent](https://stackoverflow.com/questions/15411158/pandas-countdistinct-equivalent) – Rabinzel Aug 27 '22 at 10:10

1 Answers1

0

You can use df.groupby on column to, apply nunique and keep only the entries > 1. So:

import pandas as pd

d = {'id': {0: 1, 1: 1, 2: 1, 3: 1, 4: 2},
 'to': {0: '0x954b890704693af242613edef1b603825afcd708',
  1: '0x954b890704693af242613edef1b603825afcd708',
  2: '0x607f4c5bb672230e8672085532f7e901544a7375',
  3: '0x9b9647431632af44be02ddd22477ed94d14aacaa',
  4: '0x9b9647431632af44be02ddd22477ed94d14aacaa'}}

df = pd.DataFrame(d)

nunique = df.groupby('to')['id'].nunique()
print(nunique)

to
0x607f4c5bb672230e8672085532f7e901544a7375    1
0x954b890704693af242613edef1b603825afcd708    1
0x9b9647431632af44be02ddd22477ed94d14aacaa    2

res = nunique[nunique>1]

print(res.index.tolist())

['0x9b9647431632af44be02ddd22477ed94d14aacaa']
ouroboros1
  • 9,113
  • 3
  • 7
  • 26