1

I came across this extremely well explained similar question (Get last "column" after .str.split() operation on column in pandas DataFrame), and used some of the codes found. However, it's not the output that I would like.

raw_data = {
    'category': ['sweet beverage, cola,sugared', 'healthy,salty snacks', 'juice,beverage,sweet', 'fruit juice,beverage', 'appetizer,salty crackers'],
    'product_name': ['coca-cola', 'salted pistachios', 'fruit juice', 'lemon tea', 'roasted peanuts']}                                                      
df = pd.DataFrame(raw_data)

Objective is to extract the various categories from each row, and only use the last 2 categories to create a new column. I have this code, which works and I have the categories of interest as a new column.

df['my_col'] = df.categories.apply(lambda s:s.split(',')[-2:])

output
my_col 
[cola,sugared]
[healthy,salty snacks]
[beverage,sweet]
...

However, it appears as a list. How can I not have it appear as a list? Can this be achieved? Thanks all!!!!!

Zoozoo
  • 240
  • 4
  • 13

2 Answers2

3

I believe you need str.split, select last to lists and last str.join:

df['my_col'] = df.category.str.split(',').str[-2:].str.join(',')
print (df)
                       category       product_name                    my_col
0  sweet beverage, cola,sugared          coca-cola              cola,sugared
1          healthy,salty snacks  salted pistachios      healthy,salty snacks
2          juice,beverage,sweet        fruit juice            beverage,sweet
3          fruit juice,beverage          lemon tea      fruit juice,beverage
4      appetizer,salty crackers    roasted peanuts  appetizer,salty crackers

EDIT:

In my opinion pandas str text functions are more recommended as apply with puru python string functions, because also working with NaNs and None.

raw_data = {
    'category': [np.nan, 'healthy,salty snacks'],
    'product_name': ['coca-cola', 'salted pistachios']}
df = pd.DataFrame(raw_data)

df['my_col'] = df.category.str.split(',').str[-2:].str.join(',')
print (df)
               category       product_name                my_col
0                   NaN          coca-cola                   NaN
1  healthy,salty snacks  salted pistachios  healthy,salty snacks

df['my_col'] = df.category.apply(lambda s: ','.join(s.split(',')[-2:]))

AttributeError: 'float' object has no attribute 'split'

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

You can also use join in the lambda to the result of split:

df['my_col'] = df.category.apply(lambda s: ','.join(s.split(',')[-2:]))
df

Result:

                       category       product_name                    my_col
0  sweet beverage, cola,sugared          coca-cola              cola,sugared
1          healthy,salty snacks  salted pistachios      healthy,salty snacks
2          juice,beverage,sweet        fruit juice            beverage,sweet
3          fruit juice,beverage          lemon tea      fruit juice,beverage
4      appetizer,salty crackers    roasted peanuts  appetizer,salty crackers
niraj
  • 17,498
  • 4
  • 33
  • 48