Computing the days value to monthly wise in pandas for datsets

Question

I am new to pandas. I have a large dataset which consists the temperature value on daily wise. I need to compute the temp by monthly wise i.e.

Here my dataset csv structure:

I need to convert into following csv structure:

I was thinking about the following approach:

for(year=2012;year<=2018;year++)
   for(month=1;month<=12;month++)
      for(day=1;day<=31;day++)
         summax+=Temp_max[day]
         summin+=Temp_min[day]
      summax/=day
      summin/=day
      print(summax,summin)

But I dont know how to do it in pandas/python, how to fetch the column value in loop, also how to handle the feb days (like 28 days, 30 days, 31 days) and bring the expected output or similar output. Any help would be appreciated. Thanks!!

muzzyq · Answer 1 · 2019-03-08T11:20:53.780

Input: Sample data as per the question

import numpy as np
import pandas as pd

data = {'year': [*np.repeat(2012, 9), 2018],
        'month': [*np.repeat(1, 4), *np.repeat(2, 3), *np.repeat(3, 2), 12],
        'day': [1, 2, 3, 31, 1, 2, 28, 1, 2, 31],
        'Temp max': [28, 26, 27, 26, 27, 26, 26, 26, 25, 26],
        'Temp min': [19, 18, 17, 19, 18, 18, 18, 18, 18, 28]}

df = pd.DataFrame(data)

Output: Resulting dataframe

    year    month   day     Temp max    Temp min
0   2012    1       1       28          19
1   2012    1       2       26          18
2   2012    1       3       7           17
3   2012    1       31      26          19
4   2012    2       1       27          18
5   2012    2       2       26          18
6   2012    2       28      26          18
7   2012    3       1       26          18
8   2012    3       2       25          18
9   2018    12      31      26          28

Input: Create pivot table, calculating the max of 'Temp max' column and min of 'Temp min' column

pivot = pd.pivot_table(data=df,
                       values=['Temp max', 'Temp min'],
                       index=['year', 'month'])

pivot.columns = ['Monthly Temp max', 'Monthly Temp min']

Output: Resulting dataframe

                Monthly Temp max    Monthly Temp min
year    month       
2012    1       26.75               18.25
        2       26.33               18.00
        3       25.50               18.00
2018    12      26.00               28.00

Alternatively: Use pandas' groupby method

grouped = (df
           .groupby(['year', 'month'])['Temp max', 'Temp min']
           .mean())

grouped.columns = ['Monthly Temp max', 'Monthly Temp min']

Output: Resulting dataframe

                Monthly Temp max    Monthly Temp min
year    month       
2012    1       26.75               18.25
        2       26.33               18.00
        3       25.50               18.00
2018    12      26.00               28.00

Nihal · Accepted Answer · 2019-03-08T11:08:54.560

1

in pandas use read_csv to read your csv file

for your average use groupby

import pandas as pd

data = {'year': [*np.repeat(2012, 9), 2018],
        'month': [*np.repeat(1, 4), *np.repeat(2, 3), *np.repeat(3, 2), 12],
        'day': [1, 2, 3, 31, 1, 2, 28, 1, 2, 31],
        'Temp max': [28, 26, 27, 26, 27, 26, 26, 26, 25, 26],
        'Temp min': [19, 18, 17, 19, 18, 18, 18, 18, 18, 28]}

df = pd.DataFrame(data)
# df = pd.read_csv('file.csv')

df2 = df.groupby(['year', 'month'])['Temp max', 'Temp min'].mean()
print(df2)

output:

             Temp max  Temp min
year month                     
2012 1      26.750000     18.25
     2      26.333333     18.00
     3      25.500000     18.00
2018 12     26.000000     28.00

if you want all years use:

df2 = df.groupby(['year', 'month'])['Temp max', 'Temp min'].mean().reset_index()

   year  month   Temp max  Temp min
0  2012      1  26.750000     18.25
1  2012      2  26.333333     18.00
2  2012      3  25.500000     18.00
3  2018     12  26.000000     28.00

edited Mar 08 '19 at 11:08

answered Mar 08 '19 at 10:52

Nihal

5,262
7
23
41

You should replace `.mean()` with `.agg({'Temp max': 'max', 'Temp min': 'min'})` to get the desired output. – muzzyq Mar 08 '19 at 10:57
1

yeah it was because arya gave equation of average temp for that month @muzzyq – Nihal Mar 08 '19 at 11:00
1

also `['Temp max', 'Temp min']` is redundant – Nihal Mar 08 '19 at 11:01
1

My mistake, I'll update my answer. – muzzyq Mar 08 '19 at 11:02

Computing the days value to monthly wise in pandas for datsets

2 Answers2