I am currently these datasets from the Kiva Kaggle Competition: https://www.kaggle.com/kiva/data-science-for-good-kiva-crowdfunding/data
I want to link a float 'MPI' value (a 'Multidimensional Poverty Index') to their corresponding geographical regions for each micro loan.
- In one dataset
kiva_mpi_region_locations.csv
each region has a single corresponding MPI value associated with it. - However, in dataset
kiva_loans.csv
where each loan is given a "Region", the data often has multiple values in the same cell separated by commas (,).
['kiva_loans.csv'/Loan Data Example] (Note, Different loans can come from the same region so in this case region is a foreign key but not a primary key):
Loan #: 653338
region: Tanjay, Negros Oriental
[kiva_mpi_region_locations.csv
/ Regional MPI value example] (Note, every region only has one MPI as region in a primary key):
region: Badakhshan
MPI: 0.387
My code so far:
RegionMPI = dict(zip(dfLocations.region, dfLocations.MPI))
{'Badakhshan': 0.387,
'Badghis': 0.466,
'Baghlan': 0.3,
'Balkh': 0.301,
'Bamyan': 0.325,
'Daykundi': 0.313,
etc}
LoanRegion = dfLoanTheme['region'].str.split(',').values.tolist()
[['Lahore']
nan,
['Dar es Salaam'],
['Liloy-Dela Paz'],
['Tanjay', ' Negros Oriental'],
['Ica'],
nan,
['Lahore']]
Any advice on how to cycle through my nested list and then use my dictionary keys to link the corresponding value from my dictionary to my list for all occurrences of that key in my list?