0

There is a dataset with three columns:

  • Col 1 : Name_of_Village
  • Col 2: Average_monthly_savings
  • Col 3: networth_in_dollars

So, I want to create a dictionary "Vill_corr" where the key values are the name of the villages and the associated values are the correlation co-effient between Col2 & Col3 using Pandas.

I am aware of methods of calculating the correlation co-efficients, but not sure how to store it against each Village name key,

corr = df["Col2"].corr(df["Col3"])

Please help.

Pragyaditya Das
  • 1,648
  • 6
  • 25
  • 44
  • What do you need help with? Do you know how to get the correlation coefficients? Do you know how to convert Pandas data structures to dict? Please [edit] to clarify. For more tips, see [ask], [mre], and [reproducible pandas examples](/q/20109391/4518341). – wjandrea Jan 26 '23 at 04:07
  • Oops, I thought you mentioned Pandas but apparently not. It'd help to mention what library(s) you're using. – wjandrea Jan 26 '23 at 04:08
  • Sorry, was in a hurry to post. Added some additional details. Please see @wjandrea – Pragyaditya Das Jan 26 '23 at 05:00

1 Answers1

1

Use groupby.apply and Series.corr:

np.random.seed(0)

df = pd.DataFrame({'Name_of_Village': np.random.choice(list('ABCD'), size=100),
                   'Average_monthly_savings': np.random.randint(0, 1000, size=100),
                   'networth_in_dollars': np.random.randint(0, 1000, size=100),
                  })

out = (df.groupby('Name_of_Village')
         .apply(lambda g: g['Average_monthly_savings'].corr(g['networth_in_dollars']))
      )

Output:

Name_of_Village
A   -0.081200
B   -0.020895
C    0.208151
D   -0.010569
dtype: float64

As dictionary:

out.to_dict()

Output:

{'A': -0.08120016678846673,
 'B': -0.020894973553868202,
 'C': 0.20815112481676484,
 'D': -0.010569152488799725}
mozway
  • 194,879
  • 13
  • 39
  • 75