0

I have a distance matrix and want to convert it to a pairwise comparison.

The distance matrix:

Distance matrix AWGS13092760-1_S21.ffn AWGS13092803-1_S22.ffn AWGS13093729-1_S5.ffn
AWGS13092760-1_S21.ffn 0 196 188
AWGS13092803-1_S22.ffn 196 0 131
AWGS13093729-1_S5.ffn 188 131 0
AWGS13094209-2_S10.ffn 1377 1359 1330

I expect my output to be like this:

FIELD1 query reference cgMLST_chewBBACCA_Difference
0 AWGS13092803-1_S22 AWGS13092803-1_S22 196
1 AWGS13093729-1_S5 AWGS13092803-1_S22 188
2 AWGS13094209-2_S10 AWGS13092803-1_S22 1377

Now, I managed to get my desired result, but I was wondering if there is a better way to do it instead of using nested for loops?

The code I used:

# read input and add query, reference and distances
    df = pd.read_excel(file)
    list_dist = []
    list_query = []
    list_ref = []
    col = df.columns

    for x in range(1, (len(col))):  # iterate every column, column index = x
        d = df[df.columns[x]][x:]
        com = df[df.columns[0]][x:]
        for item in com:  # appends all query sample IDs to a list
            list_query.append(item)
        for it in d:  # appends all distances to a list
            list_dist.append(it)
        for f in range(x, (len(col) - 1)):  # appends all reference sample IDs to a list
            reference = df.columns[x]
            list_ref.append(reference)


    df_pws_comp = pd.DataFrame()
    df_pws_comp["query"] = list_query  # list with query IDs
    df_pws_comp["reference"] = list_ref  # list with reference IDs
    df_pws_comp["cgMLST_chewBBACA_Difference"] = list_dist

Thank you in advance!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459

0 Answers0