I want to calculate the standard deviation of a field called QUANTITY
in my data using pandasql
. The problem is that pandasql
does not have STD
, STDEV
, or anything else like that supported. I want to calculate the COV
for a field in my data but I'm having a difficult time doing that. Here is my code with the SQL query:
mysql = lambda q: sqldf(q, globals())
classified_df = mysql("""with data as (
select
CP_REF,
count(*) * 1.0 /
nullif(count(case when QUANTITY > 0 then 1 end), 0) as ADI,
stdevp(QUANTITY) / nullif(avg(QUANTITY), 0) as COV
from df
where parent is not null
group by CP_REF
)
select
CP_REF, ADI, COV,
case
when ADI < 1.32 and COV < 0.49 then 'Smooth'
when ADI >= 1.32 and COV < 0.49 then 'Intermittent'
when ADI < 1.32 and COV >= 0.49 then 'Erratic'
when ADI >= 1.32 and COV >= 0.49 then 'Lumpy'
else 'Smooth'
end as DEMAND
from data;""")
The line stdevp(QUANTITY) / nullif(avg(QUANTITY), 0) as COV
is what I'm looking at in particular. Please help.
It doesn't necessarily need to be in pandasql
though. I just need to query a dataframe using this method (query).