I have a data frame(in csv file) with two columns each containing lists(of variable length) in string format. I am providing the link to the google drive where I have stored the csv file for reference https://drive.google.com/file/d/1Hdu04JdGpPqG9_k6Mjx_1XNLBvogXfnN/view?usp=sharing The dataframe looks like this
Opp1 Opp2
0 ['KingdomofPoland','GrandDuchyofLithuania'] ['Georgia']
1 ['NorthernYuanDynasty'] ['Georgia']
2 ['SpanishEmpire','CaptaincyGeneralofChile'] ['ChechenRepublic']
... ... ...
3409 ['Turkey','SyrianOpposition'] ['CatholicLeague','SpanishEmpire']
3410 ['Egypt','UnitedArabEmirates'] ['SpanishEmpire']
3411 ['Turkey','SyrianOpposition'] ['SpanishEmpire']
3412 ['UnitedStates','UnitedKingdom','SaudiArabia'] ['SpanishEmpire']
3413 ['Turkey'] ['Russia']
3414 rows × 2 columns
The columns values are strings, I figured that out when I do
Input - df['Opp1'][0][0]
Out - '['
Output is given as '['. Instead the output should be the first element of the list of first row i.e 'KingdomofPoland'.
After solving this issue, I want to create a new column by combining elements of lists from each row of Opp1 and Opp2 columns. The elements of each row in Opp1 column are the name of countries and empires that were involved in a war with the corresponding country/empire of the same row in Opp2 column. So basically a new column with row entries as
new_col
0 ['KingdomofPoland', 'Georgia']
0 ['GrandDuchyofLithuania', 'Georgia']
1 ['NorthernYuanDynasty', 'Georgia']
2 ['SpanishEmpire', 'ChechenRepublic']
2 ['CaptaincyGeneralofChile', 'ChechenRepublic']
... ... ...
3409 ['Turkey', 'CatholicLeague']
3409 ['Turkey', 'SpanishEmpire']
3409 ['SyrianOpposition', 'CatholicLeague]
3409 ['SyrianOpposition', 'SpanishEmpire']
3410 ['Egypt','SpanishEmpire']
3410 ['UnitedArabEmirates','SpanishEmpire']
3411 ['Turkey', 'SpanishEmpire']
3411 ['SyrianOpposition', 'SpanishEmpire']
.................
This will essentially introduce new rows as we are kind of exploding the Opp1 and Opp2 columns simultaneously iterating over there rows elements.
The end goal is to get an edge list of countries that were involved in a specific war represented by the original Opp1(opposition 1) and Opp2(opposition2) columns. Each entity(country) from Opp1 row list should be attached to each entity(country) of Opp2 row list. The final dataset will be used on Gephi as edge lists.
I am a beginner in data analysis with python. till now I have been cleaning my dataset manually which has consumed upteen precious hours. Can anyone help me with this.
Note - There are multiple similar entries in each row of Opp1 and Opp2 columns as same countries fought wars many times in different years.
I am attaching the pic for df_types of my dataframe as requested.