0

I am trying to calculate the standard deviation applied to all rows of my column called Months between with a format like this [5, 1, 3, 1, 2, 2, 1, 3, 3, 1]

I have tried simply with this: merged_df["standard deviation"] = merged_df["Months between"].std() but i get this error : TypeError: setting an array element with a sequence

Then i tried with a more basic function:

def calculate_std_dev(numbers):
    n = len(numbers)
    mean = sum(numbers) / n
    sum_squares = sum((x - mean)**2 for x in numbers)
    variance = sum_squares / (n - 1)
    std_dev = math.sqrt(variance)
    return std_dev

TypeError: unsupported operand type(s) for +: 'int' and 'list' but get this 

I fell it has something to do with the formats, but can't figure out what.

Can someone have an idea? Thanks

Mile
  • 11
  • 2

1 Answers1

0

If the contents of your column is a list, you can do:

import numpy as np

merged_df["standard deviation"] = merged_df["Months between"].apply(lambda x:np.std(x))
pieterbons
  • 1,604
  • 1
  • 11
  • 14