I wrote this code
from numpy.random import randn
from numpy.random import seed
from scipy.stats import pearsonr
import pandas as pd
df = pd.read_csv('C:/ **** /gdp-per-capita-worldbank.csv')
df=df[df['Year']==2020]
df1 = pd.read_csv('C:/*****/crude-birth-rate.csv')
df1=df1[df1['Year']==2020]
I put *** just for my privacy, but in the code there is the correct path. Now I have to compare the two dataframes with a Pearson correlation. The first dataframe looks like this (A, B, C etc are states:
State GDP
A 45
B 90
C 70
D 20
The second dataframe looks like this (A, B, C etc are states:
State crude_birth_rate
A 3
B 2
Europe 30
C 1
Asia 40
K 4
So as you can see the two dataframes are not correctly matched
The desired result should be an another dataframe like this:
State GDP crude_birth_rate
A 45 3
B 90 2
C 70 1
so the final dataframes should be composed by every state and with in-line the GDP and crude birth rate in year 2020 and other non-matching states or group should be dropped.
thank you