Since I dind't had your DF I invented one from what I understood from your question:
List generator (just to exemplify your df):
x=int(input('Insert lenght (int): '))
y=str(input('Insert string: '))
lst=list([y]*x)
new_list=[]
for i in range(x):
new_list.append(lst[i]+str(' ')+str(i))
new_list.append('Jurrassic World ') # added your film
actors=['Vin Diesel|Shahrukh Khan|Salman Khan|Irrfan Khan',
'Vin Gasoline|Harrison Tesla|Salmon Rosa|Matt Angel|Demi Less',
'Not von Diesel|Ryan Davidson',
'Chris Bratt|Bread Butter|Bruce Wayno|Robinson Crusoe',
'Groot|Watzlav|David Bronzefield|Vin Diesel',
'Jessica Fox|Jamie Rabbit|Harrison Tesla|Salmon Rosa',
'Bryce Dallas Howard|David Bronzefield|Robinson Crusoe',
'Asterix|Garfield|Chris Pratt|Smurfix',
'Almost vin Diesel|Vin Gasoline|Dwayne Paper',
'Vin Gasoline|Jessica Fox|Demi Less',
'Chris Pratt|Bryce Dallas Howard|Irrfan Khan|Vincent D`Onofrio|Nick Robinson'] # 11 rows
votes_average = np.random.uniform(low=6, high=9.8, size=(11,))
Here my df for the answer:
df=pd.DataFrame({'film' : new_list, 'actors': actors, 'imdb' : votes_average})
# First split the column with our cast and split it in other columns, named `cast_x`
part=df['actors'].str.split('|',expand=True).rename(columns= lambda x : 'cast_'+str(x))
#Now joining to main df and creating df_new
df_new=pd.concat([df,beta],axis=1)
Now comes a complicated part, but you try it for your selft after each method and see what is happening to the df:
group = (df_new.filter(like='cast').stack()
.reset_index(level=1, drop=True)
.to_frame('casts')
.join(df)
.groupby('casts')
.agg({'imdb':(np.mean,np.size),'film': lambda x: list(pd.unique(x))}))
I found reasonable to use .agg
and get more statistics(you can apply np.min
and/or np.max
after the ,
as well).
I wanted to see the avg from how many movies np.size
and which movies did an actor do lambda with pd.unique
:
group.loc['Vin Gasoline']