I have a dataframe with many columns and rows, completely unsorted. I would like to sort rows in each column such that the each element in each row is next to the most similar one in all other columns. I'm aware of this solution, but I'd like to have this being compared among multiple lists / columns, like a sorting algorithm.
col1 col2 col3 col4
0 Some Black Red Sky
1 Blue Green Floor Bucket
2 Blacky Same Green Rad
Let's consider that the first letter (first 2? 3?) must be an exact hit. It would result in
col1 col2 col3 col4
0 Some Same Sky #Some and Same are very similar, Sky is the closest in col4
1 Blue #No match for Blue
2 Blacky Black Bucket #Black,Blacky are similar, and more similar to Bucket
3 Green Green #Exact match
4 Red Rad #Similar Match
5 Floor #No word started with F in any other column
To solve this I thought of either:
- For each column, building a unique list of all elements in all other columns and align the target column vs the consensus. But I'm not sure how this would go afterwards
- For each element, find the closest 1 element for each of the other columns.
- Simply sorting alphabetically.
All the above have problems, and in all situations I will need to do some manual inspection (which is fine).