Split a column with a mix of concatenated and not concatenated items delimited by comma over a list

Question

I am working with a dataframe that contains a column with concatenated and not concatenated items:

Name	Group	Average Age
Mary	A, D, T, F	10
Lukas	A, D, T, F	20
John	A, D, T, F	5
Mary	B, G, Y, Z	15
Lukas	B, G, Y, Z	25
John	B, G, Y, Z	50
Mary	K	12
Lukas	L	23
John	M	56

I have a group list with:

group_list = ['D', 'Y', 'K', 'L', 'M']

I want the Average Age value for all names over this list, but firstly I'd like to split Group column.

I've tried:

if ',' in df['Group']:
    new_df['Group'] = df['Group'].str.split(",").apply(lambda x: list(set(x).intersection(set(group_list)))[0])
    else:
        new_df['Group'] = df['Group']

I also tried:

 new_df['Group'] = df['Group'].str.split(",").apply(lambda x: [list(set(x).intersection(set(group_list)))[0]] for ',' in df['Group'] else df['Group'])

But I am not able to run, Kernel always crash.

Anyone knows how to solve this?

Thanks!

besides of splitting, how are you going to calculate average (considering the current values of `Average Age`)? — RomanPerekhrest, Mar 02 '23 at 12:10
I won't. I will get Average Age column values, Roman. No calculation needed in my case. — Naiara Tabanez, Mar 02 '23 at 12:12

Split a column with a mix of concatenated and not concatenated items delimited by comma over a list

0 Answers0