I have a distance matrix and want to convert it to a pairwise comparison.
The distance matrix:
Distance matrix | AWGS13092760-1_S21.ffn | AWGS13092803-1_S22.ffn | AWGS13093729-1_S5.ffn |
---|---|---|---|
AWGS13092760-1_S21.ffn | 0 | 196 | 188 |
AWGS13092803-1_S22.ffn | 196 | 0 | 131 |
AWGS13093729-1_S5.ffn | 188 | 131 | 0 |
AWGS13094209-2_S10.ffn | 1377 | 1359 | 1330 |
I expect my output to be like this:
FIELD1 | query | reference | cgMLST_chewBBACCA_Difference |
---|---|---|---|
0 | AWGS13092803-1_S22 | AWGS13092803-1_S22 | 196 |
1 | AWGS13093729-1_S5 | AWGS13092803-1_S22 | 188 |
2 | AWGS13094209-2_S10 | AWGS13092803-1_S22 | 1377 |
Now, I managed to get my desired result, but I was wondering if there is a better way to do it instead of using nested for loops?
The code I used:
# read input and add query, reference and distances
df = pd.read_excel(file)
list_dist = []
list_query = []
list_ref = []
col = df.columns
for x in range(1, (len(col))): # iterate every column, column index = x
d = df[df.columns[x]][x:]
com = df[df.columns[0]][x:]
for item in com: # appends all query sample IDs to a list
list_query.append(item)
for it in d: # appends all distances to a list
list_dist.append(it)
for f in range(x, (len(col) - 1)): # appends all reference sample IDs to a list
reference = df.columns[x]
list_ref.append(reference)
df_pws_comp = pd.DataFrame()
df_pws_comp["query"] = list_query # list with query IDs
df_pws_comp["reference"] = list_ref # list with reference IDs
df_pws_comp["cgMLST_chewBBACA_Difference"] = list_dist
Thank you in advance!