0

I have a Data Frame in Python that contains names of actors and the movies in which they participated.

Something like that:

Name     Films

Adam     tt2488496,tt7653254,tt7653254,tt2488496
Jhon     tt1596363,tt1386588,tt6266538
Juan     tt7653254,tt2488496

I need to separate the actors for each movie they participated in.

just like that:

Name     Films

Adam     tt2488496
Adam     tt7653254
Adam     tt7653254
Adam     tt2488496

How can I do it?

2 Answers2

3

Another way to do it is by using pandas melt function as shown below:

# split the films column to many columns
df2 = df['films'].str.split(expand = True)

now df2 is

      0             1         2            3
0   tt2488496   tt7653254   tt7653254   tt2488496
1   tt1596363   tt1386588   tt6266538   None
2   tt7653254   tt2488496   None        None

Join those split columns with Names column

df3= pd.concat([df['Names'], df2], axis = 1)

   Names          0            1            2        3
0   Adam    tt2488496   tt7653254   tt7653254   tt2488496
1   John    tt1596363   tt1386588   tt6266538   None
2   John    tt7653254   tt2488496   None        None

Use pandas melt to unpivot and drop unnecessary columns and Nans'

final_result = pd.melt(df3, id_vars = ['Names'], value_vars = list(df2.columns)).drop(columns = ['variable']).dropna()

which is

   Names      value
0  Adam  tt2488496
1  John  tt1596363
2  John  tt7653254
3  Adam  tt7653254
4  John  tt1386588
5  John  tt2488496
6  Adam  tt7653254
7  John  tt6266538
9  Adam  tt2488496
plasmon360
  • 4,109
  • 1
  • 16
  • 19
0

You can do it like this:

df = pd.DataFrame(df.Films.str.split(',').tolist(), index=df.Name).stack().reset_index()[['Name',0]]
df.columns = ['Name', 'Films']

   Name      Films
0  Adam  tt2488496
1  Adam  tt7653254
2  Adam  tt7653254
3  Adam  tt2488496
4  Jhon  tt1596363
5  Jhon  tt1386588
6  Jhon  tt6266538
7  Juan  tt7653254
8  Juan  tt2488496
Est
  • 420
  • 3
  • 9