1

I am trying to separate a pandas dataframe column which has values like this -

enter image description here

My aim is to create a list of values for each "constraint" and put each value inside single quotes. This should be the expected output -

enter image description here

I have tried pandas groupby apply(list) but it's not working as expected. I was hoping to get a proper pandas list which has each of the values inside quotes and then separated by commas, however, its generating the below output (the values are separated by comma but quotes are only before first value and after last value).

Here is my code -

grouped_targets = target_table.groupby(['user_id', 'target_type'])['constraints'].apply(set).apply(list).reset_index()
grouped_targets.head()

And this is the output generated from my code-

enter image description here

What am I doing wrong?

lightyagami96
  • 336
  • 1
  • 4
  • 14

3 Answers3

1

Use custom lambda function for split values by , in list comprehension for flatten nested lists, convert to sets and last to lists:

target_table = pd.DataFrame({'user_id':[1,2,1,2,1,2],
                             'target_type':[2,8,2,8,8,8],
                             'constraints':['aaa, dd','ss, op','ja, ss',
                                            'dd, su, per', 'a', 'uu, ss']})




f = lambda x: list(set(["'" + z + "'" for y in x.str.split(', ') for z in y]))
grouped_targets = (target_table.groupby(['user_id', 'target_type'])['constraints']
                               .apply(f)          
                               .reset_index())

print (grouped_targets['constraints'].tolist())
[["'ss'", "'aaa'", "'dd'", "'ja'"], ["'a'"], 
 ["'ss'", "'per'", "'uu'", "'su'", "'op'", "'dd'"]]

f = lambda x: list(set([z for y in x.str.split(', ') for z in y]))
grouped_targets = (target_table.groupby(['user_id', 'target_type'])['constraints']
                               .apply(f)          
                               .reset_index())

print (grouped_targets['constraints'].tolist())
[['ss', 'dd', 'aaa', 'ja'], ['a'], 
 ['ss', 'su', 'uu', 'per', 'op', 'dd']]
    

EDIT:

I think most complicated is custom function, you can test how it working in list:

L = ['aaa, dd','ss, op','ja, ss', 'dd, su, per', 'a', 'uu, ss']

If only split values in list output is different, get list of lists (nested lists):

a = [x.split(', ') for x in L]
print (a)
[['aaa', 'dd'], ['ss', 'op'], ['ja', 'ss'], ['dd', 'su', 'per'], ['a'], ['uu', 'ss']]

So is possible flatten values with combination with split:

a = [z for x in L for z in x.split(', ')]
print (a)
['aaa', 'dd', 'ss', 'op', 'ja', 'ss', 'dd', 'su', 'per', 'a', 'uu', 'ss']
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

You should be able to achieve that by splitting the strings, so:

new_df = df['constraints'].apply(lambda x: x.split(', '))
dzang
  • 2,160
  • 2
  • 12
  • 21
0

Try using split first.

... ].str.split(',').apply(list)

LevB
  • 925
  • 6
  • 10