0

I have pandas a dataframe like this

    Event_Id       Investigation_Type Accident_Number   Event_Date      
0   20180922X71035  ACCIDENT          DCA18CA289        09/10/2018  
1   20180507X00658  ACCIDENT          DCA18CA169        05/07/2018  
4   20171212X50255  ACCIDENT          DCA18CA043B       12/03/2017

and I try to iterate through it like this...

n1col = 0
n2col = 1

for i in df.index:
    Node1=df.Event_Id
    for j in df.index:
        Node2=df.Event_Id
        if (Node1 != Node2):
            new_df.loc[j,n1col] = Node1
            new_df.loc[j,n2col] = Node2

I don't know if my approach is right (as I see it's not) I want some help so I can get a result like the one below...

I am new to that kind of stuff so I need your help.

    Node_1          Node_2 
0   20180922X71035  20180507X00658
1   20180922X71035  20171212X50255
2   20180507X00658  20180922X71035
3   20180507X00658  20171212X50255
4   20171212X50255  20180922X71035
6   20171212X50255  20180507X00658

Thanks in Advance.

EstevaoLuis
  • 2,422
  • 7
  • 33
  • 40
runnerhigh
  • 13
  • 4

2 Answers2

2

I know you've already accepted an answer. But in case you are not looking for a combination and instead want the cartesian product and then filter it down such that the 2 columns are not equal...

>>> df
         Event_Id Accident_Number  Event_Date Investigation_Type
0  20180922X71035      DCA18CA289  09/10/2018           ACCIDENT
1  20180507X00658      DCA18CA169  05/07/2018           ACCIDENT
2  20171212X50255     DCA18CA043B  12/03/2017           ACCIDENT

Get the Cartesian Product Answer from this other StackOverflow post

>>> df['key'] = 0
>>> df
         Event_Id Accident_Number  Event_Date Investigation_Type  key
0  20180922X71035      DCA18CA289  09/10/2018           ACCIDENT    0
1  20180507X00658      DCA18CA169  05/07/2018           ACCIDENT    0
2  20171212X50255     DCA18CA043B  12/03/2017           ACCIDENT    0
>>> df2 = df.merge(df, on='key').filter(items=['Event_Id_x', 'Event_Id_y'])
>>> df2
       Event_Id_x      Event_Id_y
0  20180922X71035  20180922X71035
1  20180922X71035  20180507X00658
2  20180922X71035  20171212X50255
3  20180507X00658  20180922X71035
4  20180507X00658  20180507X00658
5  20180507X00658  20171212X50255
6  20171212X50255  20180922X71035
7  20171212X50255  20180507X00658
8  20171212X50255  20171212X50255

Use .loc/boolean indexing to filter down your DataFrame

>>> df2.loc[df2['Event_Id_x'] != df2['Event_Id_y']]
       Event_Id_x      Event_Id_y
1  20180922X71035  20180507X00658
2  20180922X71035  20171212X50255
3  20180507X00658  20180922X71035
5  20180507X00658  20171212X50255
6  20171212X50255  20180922X71035
7  20171212X50255  20180507X00658

Similar to Josh's answer using itertools. But this time using product instead of combination:

>>> df = df.set_index('Event_Id')
>>> df3 = pd.DataFrame(list(product(df.index.tolist(), df.index.tolist())), columns=['Node1', 'Node2'])
>>> df3.loc[df3['Node1'] != df3['Node2']]
            Node1           Node2
1  20180922X71035  20180507X00658
2  20180922X71035  20171212X50255
3  20180507X00658  20180922X71035
5  20180507X00658  20171212X50255
6  20171212X50255  20180922X71035
7  20171212X50255  20180507X00658
Orenshi
  • 1,773
  • 11
  • 12
1

You can do it in one line with

from itertools import combinations

pd.DataFrame(list(combinations(df.index.tolist(), 2)), columns=['Node1', 'Node2'])
Josh Friedlander
  • 10,870
  • 5
  • 35
  • 75
  • Thanks a lot, you saved my day!! – runnerhigh Jan 08 '19 at 14:29
  • You're very welcome. If you're new to Pandas you'll discover that you can do an enormous amount without using loops. – Josh Friedlander Jan 08 '19 at 14:37
  • In your original post, aren't you asking for a cartesian product and excluding the records where they're equal? I didn't think you were looking for combination – Orenshi Jan 08 '19 at 14:40
  • @Orenshi Is there any difference if it is a cartesian product or a combination? My goal is to create a network with these ids and visualize it(if this helps). – runnerhigh Jan 08 '19 at 16:25