1

I am new to pandas. I have a large dataset which consists the temperature value on daily wise. I need to compute the temp by monthly wise i.e.

enter image description here

Here my dataset csv structure: enter image description here

I need to convert into following csv structure: enter image description here

I was thinking about the following approach:

for(year=2012;year<=2018;year++)
   for(month=1;month<=12;month++)
      for(day=1;day<=31;day++)
         summax+=Temp_max[day]
         summin+=Temp_min[day]
      summax/=day
      summin/=day
      print(summax,summin)

But I dont know how to do it in pandas/python, how to fetch the column value in loop, also how to handle the feb days (like 28 days, 30 days, 31 days) and bring the expected output or similar output. Any help would be appreciated. Thanks!!

rpanai
  • 12,515
  • 2
  • 42
  • 64
Ratnesh
  • 1,554
  • 3
  • 12
  • 28

2 Answers2

1

Input: Sample data as per the question

import numpy as np
import pandas as pd

data = {'year': [*np.repeat(2012, 9), 2018],
        'month': [*np.repeat(1, 4), *np.repeat(2, 3), *np.repeat(3, 2), 12],
        'day': [1, 2, 3, 31, 1, 2, 28, 1, 2, 31],
        'Temp max': [28, 26, 27, 26, 27, 26, 26, 26, 25, 26],
        'Temp min': [19, 18, 17, 19, 18, 18, 18, 18, 18, 28]}

df = pd.DataFrame(data)

Output: Resulting dataframe

    year    month   day     Temp max    Temp min
0   2012    1       1       28          19
1   2012    1       2       26          18
2   2012    1       3       7           17
3   2012    1       31      26          19
4   2012    2       1       27          18
5   2012    2       2       26          18
6   2012    2       28      26          18
7   2012    3       1       26          18
8   2012    3       2       25          18
9   2018    12      31      26          28

Input: Create pivot table, calculating the max of 'Temp max' column and min of 'Temp min' column

pivot = pd.pivot_table(data=df,
                       values=['Temp max', 'Temp min'],
                       index=['year', 'month'])

pivot.columns = ['Monthly Temp max', 'Monthly Temp min']

Output: Resulting dataframe

                Monthly Temp max    Monthly Temp min
year    month       
2012    1       26.75               18.25
        2       26.33               18.00
        3       25.50               18.00
2018    12      26.00               28.00

Alternatively: Use pandas' groupby method

grouped = (df
           .groupby(['year', 'month'])['Temp max', 'Temp min']
           .mean())

grouped.columns = ['Monthly Temp max', 'Monthly Temp min']

Output: Resulting dataframe

                Monthly Temp max    Monthly Temp min
year    month       
2012    1       26.75               18.25
        2       26.33               18.00
        3       25.50               18.00
2018    12      26.00               28.00
muzzyq
  • 904
  • 6
  • 14
1

in pandas use read_csv to read your csv file

for your average use groupby

import pandas as pd

data = {'year': [*np.repeat(2012, 9), 2018],
        'month': [*np.repeat(1, 4), *np.repeat(2, 3), *np.repeat(3, 2), 12],
        'day': [1, 2, 3, 31, 1, 2, 28, 1, 2, 31],
        'Temp max': [28, 26, 27, 26, 27, 26, 26, 26, 25, 26],
        'Temp min': [19, 18, 17, 19, 18, 18, 18, 18, 18, 28]}

df = pd.DataFrame(data)
# df = pd.read_csv('file.csv')

df2 = df.groupby(['year', 'month'])['Temp max', 'Temp min'].mean()
print(df2)

output:

             Temp max  Temp min
year month                     
2012 1      26.750000     18.25
     2      26.333333     18.00
     3      25.500000     18.00
2018 12     26.000000     28.00

if you want all years use:

df2 = df.groupby(['year', 'month'])['Temp max', 'Temp min'].mean().reset_index()

   year  month   Temp max  Temp min
0  2012      1  26.750000     18.25
1  2012      2  26.333333     18.00
2  2012      3  25.500000     18.00
3  2018     12  26.000000     28.00
Nihal
  • 5,262
  • 7
  • 23
  • 41