How do I use the .count() function so that it gives me the sample variance of column 'Y'? As of now it giving me a list with the same value(s)

Question

I am working on an assignment that is meant to help me familiarize myself with pandas, and the portion I am stuck on wants me to find the sample variance of Y. It says I must draft the Python/Pandas statement for this step and provides a hint (the dframe.count() method may be useful here). I know that the sample variance is the sum of squared differences divided by one less than the number of elements in the sample.

import pandas as pd
datafile='/Users/austinite/Desktop/Assignment1Data.csv'
frame = pd.read_csv(datafile)

yMean = frame['Y'].mean()
frame['Diff'] = frame['Y'] - yMean
frame['DiffSqr'] = frame['Diff'].pow(2)

sumSqrDiff = frame['DiffSqr'].sum()
sampleVariance = sumSqrDiff / (frame.count(axis='columns') - 1)`

This is the code that I have as of now. I have tried doing (axis='Y') because I thought that it would take the number of values in that column but that didn't work because it says Y is not defined. I then thought maybe using columns would work, and although it seems to work, it provides a list of the same value 300x.

Edit to add solution:

n = frame['Y'].count()
sampleVariance = sumSqrDiff / (n - 1)

If you want the number of rows, just use `len(frame)`? (COUNT will operate on each column separately, skip nulls, etc, so just use len?) Pandas can calculate variance natively though; https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.var.html — MatBailie, Mar 12 '23 at 22:37
If you MUST use count, count a single column, not a dataframe; `frame['Y'].count()` — MatBailie, Mar 12 '23 at 22:44
I did see that, I think it is just meant to help us understand the different methods. I am going to edit my post, because I figured out how to fix my problem. — Austinite, Mar 12 '23 at 22:46
Thank you! Yes, I realized I was using it incorrectly and that fixed my problem :) — Austinite, Mar 12 '23 at 22:49

How do I use the .count() function so that it gives me the sample variance of column 'Y'? As of now it giving me a list with the same value(s)

0 Answers0