1

I want to calculate the standard deviation of a field called QUANTITY in my data using pandasql. The problem is that pandasql does not have STD, STDEV, or anything else like that supported. I want to calculate the COV for a field in my data but I'm having a difficult time doing that. Here is my code with the SQL query:

mysql = lambda q: sqldf(q, globals())
classified_df = mysql("""with data as (
    select
        CP_REF,
        count(*) * 1.0 /
          nullif(count(case when QUANTITY > 0 then 1 end), 0) as ADI,
          stdevp(QUANTITY) / nullif(avg(QUANTITY), 0) as COV
    from df
    where parent is not null
    group by CP_REF
)
select
    CP_REF, ADI, COV,
    case
        when ADI <  1.32 and COV <  0.49 then 'Smooth'
        when ADI >= 1.32 and COV <  0.49 then 'Intermittent'
        when ADI <  1.32 and COV >= 0.49 then 'Erratic'
        when ADI >= 1.32 and COV >= 0.49 then 'Lumpy'
        else 'Smooth'
    end as DEMAND
from data;""")

The line stdevp(QUANTITY) / nullif(avg(QUANTITY), 0) as COV is what I'm looking at in particular. Please help.

It doesn't necessarily need to be in pandasql though. I just need to query a dataframe using this method (query).

Ravaal
  • 3,233
  • 6
  • 39
  • 66
  • you could take one of these https://stackoverflow.com/questions/2298339/standard-deviation-for-sqlite/24423341 it is for sqllite but they do work n standard sql – nbk Jun 22 '22 at 18:45
  • I've already seen that and I don't know exactly what to do. I'm looking for an answer to my problem in specific. – Ravaal Jun 22 '22 at 18:53

1 Answers1

0

I had the same problem as you, I calculated manually (code below). I hope it helps.

q1 = """

SELECT 
sqrt(avg(QUANTITY_MEAN*QUANTITY_MEAN)) as stdd_value
        
from
(select x1.*,
(QUANTITY-(AVG(QUANTITY) over (partition by CP_REF))) as QUANTITY_MEAN,
FROM df x1)
GROUP BY CP_REF
"""
    
dfsql1 = ps.sqldf(q1)
Julia Meshcheryakova
  • 3,162
  • 3
  • 22
  • 42
Neverse
  • 11
  • 3