I realize how, confusing the title sounds so let me explain my issue. I have a DataFrame separated by the Column ID. The Column ID represents the cluster. Each Cluster DataFrame has the same Column labels.
I'm trying to create a function, that allows me to send in each cluster DataFrame, into the function, and then returns calculated the standard deviation of that column (for 14 columns). Image of cluster_0 below:
Of course, I could go thorough and list out each column for each cluster, but that's time consuming and not very efficient. If you could check my code and let me know where and how it went wrong I'd greatly appreciate it.
What I'm trying to achieve: the standard deviation of each (A -> N) column for each DataFrame(cluster)
My Code:
cluster_joint_col_name = list(["X", "Y", "Cluster ID", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N"])
joint_table_df.columns = cluster_joint_col_name
cluster_0 = joint_table_df[joint_table_df['Cluster ID'] == 0]
cluster_1 = joint_table_df[joint_table_df['Cluster ID'] == 1]
cluster_2 = joint_table_df[joint_table_df['Cluster ID'] == 2]
def standardDeviation(self):
self = self[['A']].stack().std()
self = self[['B']].stack().std()
self = self[['C']].stack().std()
self = self[['D']].stack().std()
self = self[['E']].stack().std()
self = self[['F']].stack().std()
self = self[['G']].stack().std()
self = self[['H']].stack().std()
self = self[['I']].stack().std()
self = self[['J']].stack().std()
self = self[['K']].stack().std()
self = self[['L']].stack().std()
self = self[['M']].stack().std()
self = self[['N']].stack().std()
return self
cluster_j_0 = pd.DataFrame(standardDeviation(cluster_0))
My error:
Traceback (most recent call last):
line 345, in
cluster_j_0 = pd.DataFrame(standardDeviation(cluster_0))
line 328, in standardDeviation
self = self[['B']].stack().std()
IndexError: invalid index to scalar variable.