0

I have a dataframe where one of the columns has values such as below :

colA colB LISTCOL
USA  100   ['ABCD (Actor)', 'XYZ (Actor, Director)', 'PQR (Producer, Writer)']
UK   1200  ['45q34y(Actor,Director, Producer)', '123 (Actor, Director)']

I want to fetch out the elements of the list on each row in the LISTCOL column such that only the element that has Actor in it gets filtered.

I tried

df['ACTOR'] = df.apply(
        lambda elem: [elem for elem in df['LISTCOL'].str if "Actor" in elem],
    axis=1)

However it is not working. Unfortunately, my pandas is 0.23.4 and hence the df.explode() is not applicable for me in this case. Can you please assist how I can get the output i desire:

OUTPUT:

colA colB  ACTOR

USA  100   ['ABCD', 'XYZ']

UK   1200  ['45q34y', '123']
asimo
  • 2,340
  • 11
  • 29

2 Answers2

0

Try this:

import re

df['Actors'] = [[re.match('(\w+)\s?\(.*?Actor', x).group(1) for x in i if re.match('(\w+)\s?\(.*?Actor', x)] for i in df['LISTCOL']]

Output:

  colA  colB                                            LISTCOL         Actors
0  USA   100  [ABCD (Actor), XYZ (Actor, Director), PQR (Pro...    [ABCD, XYZ]
1   UK  1200  [45q34y(Actor,Director, Producer), 123 (Actor,...  [45q34y, 123
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
0

I was considering about using (pd.Series).map()

def make_actors_column(ser):
    temp_list = ''.join(ser).split('(')
    actor_list = []
    for i,string in enumerate(temp_list):
        if 'Actor' in string:
            name_of_actor = temp_list[i-1].split(')')[-1]
            actor_list.append(name_of_actor.strip())
    return actor_list


make_actors_column(df.loc[0,'LISTCOL'])
-->['ABCD', 'XYZ']

df['ACTOR'] = df['LISTCOL'].map(make_actors_column)
df

    colA colB       LISTCOL                                     ACTOR
0   USA 100 [ABCD (Actor), XYZ (Actor, Director), PQR (Pro...   [ABCD, XYZ]
1   UK  120 [45q34y(Actor,Director, Producer), 123 (Actor,...   [45q34y, 123]

I think this function is enough to apply your example