0

I wrote this code

from numpy.random import randn
from numpy.random import seed
from scipy.stats import pearsonr
import pandas as pd

df = pd.read_csv('C:/ **** /gdp-per-capita-worldbank.csv')

df=df[df['Year']==2020]

df1 = pd.read_csv('C:/*****/crude-birth-rate.csv')

df1=df1[df1['Year']==2020]

I put *** just for my privacy, but in the code there is the correct path. Now I have to compare the two dataframes with a Pearson correlation. The first dataframe looks like this (A, B, C etc are states:

State     GDP
A         45
B         90
C         70
D         20

The second dataframe looks like this (A, B, C etc are states:

State     crude_birth_rate
A         3
B         2
Europe    30
C         1
Asia      40
K         4

So as you can see the two dataframes are not correctly matched

The desired result should be an another dataframe like this:

State     GDP     crude_birth_rate
A         45      3
B         90      2
C         70      1 

so the final dataframes should be composed by every state and with in-line the GDP and crude birth rate in year 2020 and other non-matching states or group should be dropped.

thank you

Machavity
  • 30,841
  • 27
  • 92
  • 100
Fllecha
  • 55
  • 5

1 Answers1

1

Use pd.merge:

out = df1.merge(df2, on='State', how='inner')
print(out)

# Output
  State  GDP  crude_birth_rate
0     A   45                 3
1     B   90                 2
2     C   70                 1

Follow this link for explanation: Pandas Merging 101

Corralien
  • 109,409
  • 8
  • 28
  • 52