I have a dataframe that looks like this:
index key set_col data
0 "a1" ("a", "b") "a1_data"
1 "a2" ("j", "k", "l", "m") "a2_data"
2 "b1" ("z", "y", "x", "w", "v", "u", "t") "b1_data"
I need to split the set_col
, if the length of the set is higher than 3 elements and add it to a duplicated row, with the same data, resulting in this df:
index key set_col data
0 "a1" ("a", "b") "a1_data"
1 "a2" ("j", "k", "l") "a2_data"
2 "a2" ("m") "a2_data"
3 "b1" ("z", "y", "x") "b1_data"
4 "b1" ("w", "v", "u") "b1_data"
5 "b1" ("t") "b1_data"
I have read other answers using explode
, replace
or assign
, like this or this but neither handles the case for splitting lists or sets to a length and duplicating the rows.
On this answer I found the following code:
def split(a, n):
k, m = divmod(len(a), n)
return (a[i*k+min(i, m):(i+1)*k+min(i+1, m)] for i in range(n))
And I try to apply to the columns like this:
df['split_set_col'] = df['set_col'].apply(split(df['set_col'], 3))
But i get the Error:
pandas.errors.SpecificationError: nested renamer is not supported