0

my sql Table

     SDATETIME   FE014BPV    FE011BPV
0   2022-05-28   5.770000   13.735000
1   2022-05-30  16.469999   42.263000
2   2022-05-31  56.480000  133.871994
3   2022-06-01  49.779999  133.561996
4   2022-06-02  45.450001  132.679001
..         ...        ...         ...
93  2022-09-08   0.000000    0.050000
94  2022-09-09   0.000000    0.058000
95  2022-09-10   0.000000    0.051000
96  2022-09-11   0.000000    0.050000
97  2022-09-12   0.000000    0.038000

My code:

import pandas as pd
import pyodbc

monthSQL = pd.read_sql_query('SELECT SDATETIME,max(FE014BPV) as flow,max(FE011BPV) as steam  FROM [SCADA].[dbo].[TOTALIZER] GROUP BY SDATETIME ORDER BY SDATETIME ASC', conn)

monthdata = monthSQL.groupby(monthSQL['SDATETIME'].dt.strftime("%b-%Y"), sort=True).sum()
print(monthdata)

Produces this incorrect output

                  flow        steam
SDATETIME                          
Aug-2022   1800.970001  2580.276996
Jul-2022   1994.300014  2710.619986
Jun-2022   3682.329998  7633.660018
May-2022   1215.950003  3098.273025
Sep-2022      0.000000     1.705000

I want output some thing like below

SDATETIME         flow        steam
May-2022   1215.950003  3098.273025
Jun-2022   3682.329998  7633.660018
Jul-2022   1994.300014  2710.619986
Aug-2022   1800.970001  2580.276996
Sep-2022      0.000000     1.705000

Also, need a sum of last 12 month data

wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • It looks like you can reorder your output by sorting on the SDATETIME column, to achieve the desired order. When you say you want to get the sum of the last 12 months, do you want that for the previous 12 months for every row in your output, for just the last 12 months or what? – itprorh66 Dec 31 '22 at 17:54
  • Why are you using `.dt.strftime()` to do the groupby? Why not groupby the actual year and month? – wjandrea Dec 31 '22 at 18:06
  • BTW, welcome to Stack Overflow! Please take the [tour] and read [ask]. – wjandrea Dec 31 '22 at 18:06
  • If `monthSQL` is all correct, just give us that. You don't need to say how you're retrieving it, plus `conn` is not defined anyway. Check out [How to make good reproducible pandas examples](/q/20109391/4518341) – wjandrea Dec 31 '22 at 18:08

2 Answers2

2

The output is correct, just not in the order you expect. Try this:

# This keep SDATETIME as datetime, not string
monthdata = monthSQL.groupby(pd.Grouper(key="SDATETIME", freq="MS")).sum()

# Rolling sum of the last 12 months
monthdata = pd.concat(
    [
        monthdata,
        monthdata.add_suffix("_LAST12").rolling("366D").sum(),
    ],
    axis=1,
)

# Keep SDATETIME as datetime for as long as you need to manipulate the
# dataframe in Python. When you need to export it, convert it to
# string
monthdata.index = monthdata.index.strftime("%b-%Y")

About the rolling(...) operation: it's easy to think that rolling(12) gives you the rolling sum of the last 12 months, given that each row represents a month. In fact, it returns the rolling sum of the last 12 rows. This is important, because if there are gaps in your data, 12 rows may cover more than 12 months. rolling("366D") makes sure that it only count rows within the last 366 days, which is the maximum length of any 12-month period.

We can't use rolling("12M") because months do not have fixed durations. There are between 28 to 31 days in a month.

Code Different
  • 90,614
  • 16
  • 144
  • 163
1

You are sorting the date names in alphabetical order - you need to specify which column to sort. You can see that because it goes (the starting letters of the dates):

SDATETIME                          
Aug-2022 # A goes before J, M, S in the alphabet
Jul-2022 # J goes after A, but before M and S in the alphabet
Jun-2022 # J goes after A, but before M and S in the alphabet
May-2022 # M goes after A, J but before S in the alphabet
Sep-2022 # S goes after A, J, M in the alphabet

To sort by months in reality, you have to make a dictionary and then sort by the .apply() method:

month_dict = {'Jan-2022':1,'Feb-2022':2,'Mar-2022':3, 'Apr-2022':4, 'May-2022':5, 'Jun-2022':6, 'Jul-2022':7, 'Aug-2022':8, 'Sep-2022':9, 'Oct-2022':10, 'Nov-2022':11, 'Dec-2022':12}

df = df.sort_values('SDATETIME', key= lambda x : x.apply (lambda x : month_dict[x]))

print(df)
SDATETIME         flow        steam
May-2022   1215.950003  3098.273025
Jun-2022   3682.329998  7633.660018
Jul-2022   1994.300014  2710.619986
Aug-2022   1800.970001  2580.276996
Sep-2022      0.000000     1.705000
Pythoneer
  • 319
  • 1
  • 16