I am working with a dataframe that contains a column with concatenated and not concatenated items:
Name | Group | Average Age |
---|---|---|
Mary | A, D, T, F | 10 |
Lukas | A, D, T, F | 20 |
John | A, D, T, F | 5 |
Mary | B, G, Y, Z | 15 |
Lukas | B, G, Y, Z | 25 |
John | B, G, Y, Z | 50 |
Mary | K | 12 |
Lukas | L | 23 |
John | M | 56 |
I have a group list with:
group_list = ['D', 'Y', 'K', 'L', 'M']
I want the Average Age value for all names over this list, but firstly I'd like to split Group column.
I've tried:
if ',' in df['Group']:
new_df['Group'] = df['Group'].str.split(",").apply(lambda x: list(set(x).intersection(set(group_list)))[0])
else:
new_df['Group'] = df['Group']
I also tried:
new_df['Group'] = df['Group'].str.split(",").apply(lambda x: [list(set(x).intersection(set(group_list)))[0]] for ',' in df['Group'] else df['Group'])
But I am not able to run, Kernel always crash.
Anyone knows how to solve this?
Thanks!