Reorder strings within object based on another data frame lookup

Question

This is my first data frame (df1).

enter image description here

I have to reorder the elements in sequence column (of my df1) based on the minimum count from the second data frame (df2)

enter image description here

so my end results should be like this

enter image description here

please explain how df2 was generated – Golden Lion Jul 13 '21 at 19:56 — Golden Lion, Jul 13 '21 at 19:56

Edo Akse · Accepted Answer · 2021-07-14T18:51:31.753

I think the code below would do what you want. I've commented inline and put links for further reading...

I know it could be compressed into shorter code, but I wanted the steps to be clear.

import pandas as pd
from pprint import pprint


data1 = {'id': ['A1234', 'A2345'],
         'Sequence': ['16 31 17', '51 59 43']}
df1 = pd.DataFrame(data1)
 

# I assumed the label en count columns are integers
data2 = {'label': [10, 11, 12, 13, 16, 17, 21, 24, 31, 43, 44, 51, 59, 60],
         'count': [214, 128, 135, 37, 184, 68, 267, 264, 231, 13, 82, 100, 99, 92]}
df2 = pd.DataFrame(data2)


def seq_value_sort(seq_df, label_df):
    new_sequence_list = []
    for value in seq_df['Sequence'].values:
        print(f'{"":-<40}') # prints a line
        # convert string to list of integers
        # https://www.geeksforgeeks.org/python-converting-all-strings-in-list-to-integers/
        sequence = [int(i) for i in value.split()]
        
        # generate an unsorted list of dict items based on Sequence
        data = []
        for index, row in label_df.T.iteritems():
            if int(row['label']) in sequence:
                data.append({'label': int(row['label']),
                             'count': int(row['count'])})
        pprint(data)

        # now sort the unsorted list based on key 'count'
        # https://stackoverflow.com/a/73050/9267296
        data = sorted(data, key=lambda k: k['count'])
        pprint(data)

        # list comprehension to make list of strings out
        # of the list of dict
        # https://stackoverflow.com/a/7271523/9267296
        sequence_sorted = [ str(item['label']) for item in data ]
        pprint(sequence_sorted)

        # create the final sequence string from the list
        new_sequence_list.append(' '.join(sequence_sorted))
    
    # create return data
    return_data = {'id': list(seq_df['id'].values),
                   'Sequence': new_sequence_list}
    pprint(return_data)
    
    # finally return a new df
    return pd.DataFrame(return_data)


df3 = seq_value_sort(df1, df2)
print(f'{"":-<40}')
print(df3)

EDIT:

Forgot the output:

----------------------------------------
[{'count': 184, 'label': 16},
 {'count': 68, 'label': 17},
 {'count': 231, 'label': 31}]
[{'count': 68, 'label': 17},
 {'count': 184, 'label': 16},
 {'count': 231, 'label': 31}]
['17', '16', '31']
----------------------------------------
[{'count': 13, 'label': 43},
 {'count': 100, 'label': 51},
 {'count': 99, 'label': 59}]
[{'count': 13, 'label': 43},
 {'count': 99, 'label': 59},
 {'count': 100, 'label': 51}]
['43', '59', '51']
{'Sequence': ['17 16 31', '43 59 51'], 'id': ['A1234', 'A2345']}
----------------------------------------
      id  Sequence
0  A1234  17 16 31
1  A2345  43 59 51

Reorder strings within object based on another data frame lookup

1 Answers1