I am working on a dataframe
which has a column that each value is a list, now I want to derive a new column which only considers list whose size is greater than 1, assigns a unique integer to the corresponding row as id. If elements in two lists are the same but with a different order, the two lists should be assigned the same id. A sample dataframe
is like,
document_no_list cluster_id
[1,2,3] 1
[3,2,1] 1
[4,5,6,7] 2
[8] 0
[9,10] 3
[10,9] 3
column cluster_id
only considers the 1st, 2nd, 3rd, 5th and 6th row, each of which has a size greater than 1, and assigns a unique integer id to its corresponding cell in the column, also [1,2,3]
, [3,2,1]
and [9,10]
, [10,9]
should be assigned the same cluster_id
.
I was asking a similar question without considering duplicates list values, at
pandas how to derived values for a new column base on another column
I am wondering how to do that in pandas.