I have a data set as the following:
input file:
id addr
301 1
301 2
301 3
301 4
302 6
302 7
302 8
302 9
302 1
303 14
303 15
303 2
304 16
304 17
304 1
and I need Python code to print out all the possible pair combinations of addr
values with common id
. There are millions of id and corresponding addr value records in the main test file. So, the code should be able to read columns from a text file.The output will be as follows (only showing for 301 and 302, the rest will continue the pattern):
1 2
1 3
1 4
2 3
2 4
3 4
6 7
6 8
6 9
7 8
7 9
8 9
1 6
1 7
1 8
1 9
2 6
2 7
2 8
2 9
3 6
3 7
3 8
3 9
4 6
4 7
4 8
4 9
1 15
2 15
3 15
......
1 16
2 16
......
15 16
So far I have done the following, but I do not have any idea how to code the pair combination part. I am new in Python, so will appreciate if someone can help me do the coding with a little bit of explanation.
# coding: utf-8
# sample tested in python 3.6
import sys
from pip._vendor.pyparsing import empty
if len(sys.argv) < 2:
sys.stderr.write("Usage: {0} filename\n".format(sys.argv[0]))
sys.exit()
fn = sys.argv[1]
sys.stderr.write("reading " + fn + "...\n")
# Initialize empty set
s = {}
line= 0
fin = open(fn,"r")
for line in fin:
line = line.rstrip()
f = line.split("\t")
line +=1
if line is 1:
txid_prev = line
addr = line
s= addr
continue
txid=line
txid_prev=line
if txid is txid_prev:
s.push(addr)
else:
# connect all pairs in s
# print all pairs as edges
s=addr
txid_prev=txid
if s is not empty:
# connect and print all edges