I want to calculate the co-variance of each of the columns with one another in my_list
. The formula is in the function def covariance_formula(...):
My code is as follows:
#!/usr/bin/python3
import pandas as pd
import numpy as np
my_list = ['A', 'B', 'C', 'D', 'E']
def create_df():
return pd.DataFrame(np.random.randint(0,100,size=(5, 5)).astype(float), columns=my_list)
def iterate_list(df):
for i in range(len(my_list)):
for j in range(i + 1, len(my_list)):
column_one = my_list[i]
column_two = my_list[j]
col_name = column_one + " vs." + column_two
column_1_value = df[df.columns[df.columns.str.startswith(column_one)]]
column_2_value = df[df.columns[df.columns.str.startswith(column_two)]]
column_1_mean = df[df.columns[df.columns.str.startswith(column_one)]].mean(axis=0)
column_2_mean = df[df.columns[df.columns.str.startswith(column_two)]].mean(axis=0)
df2[col_name] = covariance_formula(column_1_value, column_2_value, column_1_mean, column_2_mean)
return df2
def covariance_formula(a, b, mean_a, mean_b):
covar = (a - mean_a) * (b - mean_b)
return covar
def main():
df = create_df()
# print(df) ## see OUTPUT A
df2 = iterate_list(df) ## <<< THIS IS WHERE I AM HAVING MY PROBLEM
# print(df2) ## see EXPECTED OUTPUT B
print(df2)
if __name__ == "__main__":
main()
Questions:
How can I create a new df df2
which will have the output of in EXPECTED OUTPUT B? Is there a faster way of doing it?
Current Problem:
The current problem I am facing is that I cannot seem to get rid of this:
NameError: name 'df2' is not defined
Things I have tried:
- How to calculate multiple columns from multiple columns in panda
- Calculating sum of multiple columns in pandas
OUTPUT A:
A B C D E
0 87.0 92.0 66.0 8.0 67.0
1 84.0 18.0 9.0 80.0 41.0
2 38.0 24.0 53.0 25.0 14.0
3 87.0 25.0 19.0 5.0 0.0
4 91.0 69.0 55.0 14.0 90.0
EXPECTED OUTPUT B:
A vs.B A vs.C A vs.D A vs.E B vs.C B vs.D B vs.E C vs.D C vs.E D vs.E
0 445.4 245.8 -176.6 236.2 1187.8 -853.8 1141.4 -471.0 629.8 -452.6
1 -182.2 -207.2 353.8 -9.2 866.6 -1479.4 38.6 -1683.0 44.0 -75.0
2 851.0 -496.4 55.2 1119.0 -272.2 30.2 613.4 -17.6 -357.8 39.8
3 -197.8 -205.4 -205.4 -407.0 440.8 440.8 873.4 458.0 907.4 907.4
4 318.2 198.6 -168.6 647.4 341.6 -290.2 1113.8 -181.0 695.0 -590.2