I am working on an assignment that is meant to help me familiarize myself with pandas, and the portion I am stuck on wants me to find the sample variance of Y. It says I must draft the Python/Pandas statement for this step and provides a hint (the dframe.count() method may be useful here). I know that the sample variance is the sum of squared differences divided by one less than the number of elements in the sample.
import pandas as pd
datafile='/Users/austinite/Desktop/Assignment1Data.csv'
frame = pd.read_csv(datafile)
yMean = frame['Y'].mean()
frame['Diff'] = frame['Y'] - yMean
frame['DiffSqr'] = frame['Diff'].pow(2)
sumSqrDiff = frame['DiffSqr'].sum()
sampleVariance = sumSqrDiff / (frame.count(axis='columns') - 1)`
This is the code that I have as of now. I have tried doing (axis='Y') because I thought that it would take the number of values in that column but that didn't work because it says Y is not defined. I then thought maybe using columns would work, and although it seems to work, it provides a list of the same value 300x.
Edit to add solution:
n = frame['Y'].count()
sampleVariance = sumSqrDiff / (n - 1)