getting mean() used in groupby to use the right grouped values for calculation

Question

Data import from csv:

Date	Item_1	Item 2
1990-01-01	34	78
1990-01-02	42	19
.	.	.
.	.	.
2020-12-31	41	23

df = pd.read_csv(r'Insert file directory')
df.index = pd.to_datetime(df.index)
gb= df.groupby([(df.index.year),(df.index.month)]).mean()

Issue: So basically, the requirement is to group the data according to year and month before processing and I thought that the groupby function would have grouped the data so that the mean() calculate the averages of all values grouped under Jan-1990, Feb-1990 and so on. However, I was wrong. The output result in the average of all values under Item_1

My example is similar to the below post but in my case, it is calculating the mean. I am guessing that it has to do with the way the data is arranged after groupby or some parameters in mean() have to be specified but I have no idea which is the cause. Can someone enlighten me on how to correct the code?

Pandas groupby month and year

Update: Hi all, I have created the sample data file .csv with 3 items and 3 months of data. I am wondering if the cause has to do with the conversion of data into df when it is imported from .csv because I have noticed some weird time data on the leftmost as shown below:

Link to sample file is: https://www.mediafire.com/file/t81wh3zem6vf4c2/test.csv/file

Your code seems to work correctly for me. Perhaps you can add a bit more detail to drill down on the problem. — bicarlsen, Aug 10 '21 at 15:22
Hi bicarlsen, I have recreated the sample data in the same file format (link is provided above) and update the code to show how I import the .csv file into the df. I have tried to run the code again and it still calculates the average of entire values under each item. — user14074078, Aug 10 '21 at 16:35
@user14074078 have you tested [my proposed solution](https://stackoverflow.com/a/68730009/16343464)? — mozway, Aug 10 '21 at 16:39
@mozway yes, I have tried your proposed solution and the result is the same. I am wondering if the conversion of csv, resulted in some unwanted item being included. — user14074078, Aug 10 '21 at 16:45

bicarlsen · Accepted Answer · 2021-08-10T18:04:42.387

1

import pandas as pd

df = pd.read_csv( 'test.csv', index_col = 'date' )
df.index = pd.to_datetime( df.index )

df.groupby([(df.index.year),(df.index.month)]).mean()

Seems to do the trick from the provided data.

edited Aug 10 '21 at 18:04

answered Aug 10 '21 at 17:39

bicarlsen

1,241
10
27

Hi bicarlsen, inclusion of the argument, index_col = 'date', works! Thanks for your help!!! Really appreciate it. Once again, I am caught-off guard by small details in code, lol. – user14074078 Aug 11 '21 at 04:24

mozway · Answer 2 · 2021-08-10T16:59:46.387

0

IIUC, you want to calculate the mean of all elements. You can use numpy's mean function that operates on the flattened array by default:

df.index = pd.to_datetime(df.index, format='%d/%m/%Y')
gb = df.groupby([(df.index.year),(df.index.month)]).apply(lambda d: np.mean(d.values))

output:

date  date
1990  1       0.563678
      2       0.489105
      3       0.459131
      4       0.755165
      5       0.424466
      6       0.523857
      7       0.612977
      8       0.396031
      9       0.452538
      10      0.527063
      11      0.397951
      12      0.600371
dtype: float64

edited Aug 10 '21 at 16:59

answered Aug 10 '21 at 15:57

mozway

194,879
13
39
75

Hi mozway, I have recreated the sample data in the same file format (link is provided above) and update the code to show how I import the .csv file into the df. I have tried to run the code again and it still calculates the average of entire values under each item. – user14074078 Aug 10 '21 at 16:39
@user14074078 I updated the code. Apparently, this works only on the raw array. – mozway Aug 10 '21 at 16:57
Hi mozway, I have tried your updated code, it results in "ValueError: time data '0' does not match format '%d/%m/%Y' (match)" However, bicarlsen's slight alteration to the argument for pd.read_csv seems to do the trick. Once again, thank you for your assistance in this. Really appreciate it! – user14074078 Aug 11 '21 at 04:19

getting mean() used in groupby to use the right grouped values for calculation

2 Answers2