Splitting a column value into list of values separated by comma

Question

I am trying to separate a pandas dataframe column which has values like this -

My aim is to create a list of values for each "constraint" and put each value inside single quotes. This should be the expected output -

I have tried pandas groupby apply(list) but it's not working as expected. I was hoping to get a proper pandas list which has each of the values inside quotes and then separated by commas, however, its generating the below output (the values are separated by comma but quotes are only before first value and after last value).

Here is my code -

grouped_targets = target_table.groupby(['user_id', 'target_type'])['constraints'].apply(set).apply(list).reset_index()
grouped_targets.head()

And this is the output generated from my code-

What am I doing wrong?

jezrael · Accepted Answer · 2021-02-02T12:18:02.267

1

Use custom lambda function for split values by , in list comprehension for flatten nested lists, convert to sets and last to lists:

target_table = pd.DataFrame({'user_id':[1,2,1,2,1,2],
                             'target_type':[2,8,2,8,8,8],
                             'constraints':['aaa, dd','ss, op','ja, ss',
                                            'dd, su, per', 'a', 'uu, ss']})




f = lambda x: list(set(["'" + z + "'" for y in x.str.split(', ') for z in y]))
grouped_targets = (target_table.groupby(['user_id', 'target_type'])['constraints']
                               .apply(f)          
                               .reset_index())

print (grouped_targets['constraints'].tolist())
[["'ss'", "'aaa'", "'dd'", "'ja'"], ["'a'"], 
 ["'ss'", "'per'", "'uu'", "'su'", "'op'", "'dd'"]]

f = lambda x: list(set([z for y in x.str.split(', ') for z in y]))
grouped_targets = (target_table.groupby(['user_id', 'target_type'])['constraints']
                               .apply(f)          
                               .reset_index())

print (grouped_targets['constraints'].tolist())
[['ss', 'dd', 'aaa', 'ja'], ['a'], 
 ['ss', 'su', 'uu', 'per', 'op', 'dd']]

EDIT:

I think most complicated is custom function, you can test how it working in list:

L = ['aaa, dd','ss, op','ja, ss', 'dd, su, per', 'a', 'uu, ss']

If only split values in list output is different, get list of lists (nested lists):

a = [x.split(', ') for x in L]
print (a)
[['aaa', 'dd'], ['ss', 'op'], ['ja', 'ss'], ['dd', 'su', 'per'], ['a'], ['uu', 'ss']]

So is possible flatten values with combination with split:

a = [z for x in L for z in x.split(', ')]
print (a)
['aaa', 'dd', 'ss', 'op', 'ja', 'ss', 'dd', 'su', 'per', 'a', 'uu', 'ss']

edited Feb 02 '21 at 12:18

answered Feb 02 '21 at 11:31

jezrael

822,522
95
1,334
1,252

this is still giving the output similar to my code - the values are consolidate withing single quotes - like 'a,b,c'. I want 'a', 'b', 'c'. – lightyagami96 Feb 02 '21 at 11:51
@lightyagami96 - Can you check edited answer? – jezrael Feb 02 '21 at 11:58
Your edit makes sense, but I'm getting this error - AttributeError: 'Series' object has no attribute 'split' – lightyagami96 Feb 02 '21 at 11:59
@lightyagami96 - added `x.str.split(', ')` – jezrael Feb 02 '21 at 12:00
it's working but quotes are weird. OUTPUT - ["'web'", "'app'"] – lightyagami96 Feb 02 '21 at 12:03
@lightyagami96 - hmm, if add `"'" + z + "'"` it means you add another quotes, because default are not show. – jezrael Feb 02 '21 at 12:06
1

@lightyagami96 - Edited answer, I think you need second solution. – jezrael Feb 02 '21 at 12:07
1

Perfect. can you please help me understand how it works? a very short explanation? – lightyagami96 Feb 02 '21 at 12:11

score 0 · Answer 2 · answered Feb 02 '21 at 11:32

0

You should be able to achieve that by splitting the strings, so:

new_df = df['constraints'].apply(lambda x: x.split(', '))

answered Feb 02 '21 at 11:32

dzang

2,160
2
12
21

score 0 · Answer 3 · answered Feb 02 '21 at 11:40

0

Try using split first.

... ].str.split(',').apply(list)

answered Feb 02 '21 at 11:40

LevB

925
6
10

Splitting a column value into list of values separated by comma

3 Answers3