I have a file say : file1.txt
, which has multiple rows and columns. I want to read that and store that as list of lists. Now I want to pair them using the logic, no 2 same rows can be in a pair. Now the 2nd last
column represent the class
. Below is my file:
27,28,29,30,1,0.67
31,32,33,34,1,0.84
35,36,37,38,1,0.45
39,40,41,42,1,0.82
43,44,45,46,1,0.92
43,44,45,46,1,0.92
Here all the 6 rows are class 1. I am using below logic to do this pairing part.
from operator import itemgetter
rule_file_name = 'file1.txt'
rule_fp = open(rule_file_name)
list1 = []
for line in rule_fp.readlines():
list1.append(line.replace("\n","").split(","))
list1=sorted(list1,key=itemgetter(-1),reverse=True)
length = len(list1)
middle_index = length // 2
first_half = list1[:middle_index]
second_half = list1[middle_index:]
result=[]
result=list(zip(first_half,second_half))
for a,b in result:
if a==b:
result.remove((a, b))
print(result)
print("-------------------")
It is working absolutely fine when I have one class only. But if my file has multiple classes then I want the pairing to be done with is the same class only. For an example if my file looks like below: say file2
27,28,29,30,1,0.67
31,32,33,34,1,0.84
35,36,37,38,1,0.45
39,40,41,42,1,0.82
43,44,45,46,1,0.92
43,44,45,46,1,0.92
51,52,53,54,2,0.28
55,56,57,58,2,0.77
59,60,61,62,2,0.39
63,64,65,66,2,0.41
75,76,77,78,3,0.51
90,91,92,93,3,0.97
Then I want to make 3 pairs from class 1, 2 from class 2 and 1 from class 3.Then I am using this logic to make the dictionary where the keys will be the classes.
d = {}
sorted_grouped = []
for row in list1:
# Add name to dict if not exists
if row[-2] not in d:
d[row[-2]] = []
# Add all non-Name attributes as a new list
d[row[-2]].append(row)
#print(d.items())
for k,v in d.items():
sorted_grouped.append(v)
#print(sorted_grouped)
gp_vals = {}
for i in sorted_grouped:
gp_vals[i[0][-2]] = i
print(gp_vals)
Now how can I do it, please help !
My desired output for file2
is:
[([43,44,45,46,1,0.92], [39,40,41,42,1,0.82]), ([43,44,45,46,1,0.92], [27,28,29,30,1,0.67]), ([31,32,33,34,1,0.84], [35,36,37,38,1,0.45])] [([55,56,57,58,2,0.77], [59,60,61,62,2,0.39]), ([63,64,65,66,2,0.41], [51,52,53,54,2,0.28])] [([90,91,92,93,3,0.97], [75,76,77,78,3,0.51])]
Edit1:
All the files will have even number of rows, where every class will have even number of rows as well.
For a particular class(say class 2), if there are
n
rows then there can be maximumn/2
identical rows for that class in the dataset.My primary intention was to get random pairing but making sure no self pairing is allowed. For that I thought of taking the row with the highest fitness value(The last column) inside any class and take any other row from that class randomly and make a pair just by making sure both the rows are not exactly the same. And this same thing is repeated for every class separately.