Now, my dataset looks like this:
tconst Actor1 Actor2 Actor3 Actor4 Actor5 Actor6 Actor7 Actor8 Actor9 Actor10
0 tt0000001 NaN GreaterEuropean, WestEuropean, French GreaterEuropean, British NaN NaN NaN NaN NaN NaN NaN
1 tt0000002 NaN GreaterEuropean, WestEuropean, French NaN NaN NaN NaN NaN NaN NaN NaN
2 tt0000003 NaN GreaterEuropean, WestEuropean, French GreaterEuropean, WestEuropean, French GreaterEuropean, WestEuropean, French NaN NaN NaN NaN NaN NaN
3 tt0000004 NaN GreaterEuropean, WestEuropean, French NaN NaN NaN NaN NaN NaN NaN NaN
4 tt0000005 NaN GreaterEuropean, British GreaterEuropean, British NaN NaN NaN NaN NaN NaN NaN
I used replace and map function to get here.
I want to create a dataframe from the above data frames such as I can get resulting dataframe as below.
tconst GreaterEuropean WestEuropean French GreaterEuropean British Arab British ............
tt0000001 2 1 0 4 1 0 2 .....
tt0000002 0 2 4 0 1 3 0 .....
GreaterEuropean British WestEuropean Italian French ... represents number of ehnicities of different actors in a particlular movie specified by tconst.
That would be like a count matrix, such as for a movie tt00001 there are 5 Arabs, 2 British, 1 WestEuropean and so on such that in a movie, how many actors are there who belong to these ethnicities. Link to data - https://drive.google.com/open?id=1oNfbTpmLA0imPieRxGfU_cBYVfWN3tZq