-1

I'm new to python . I'm looking for a way to generate mean for row values based on column names(Column names are date series formats from January to December). I want to generate mean for every 10 days for over a period of an year. My dataframe is in the below format(2000 rows)

import pandas as pd
df= pd.DataFrame({'A':[81,80.09,83,85,88],
                  'B':[21.8,22.04,21.8,21.7,22.06],
                  '20210113':[0,0.05,0,0,0.433],
                  '20210122':[0,0.13,0,0,0.128],
                  '20210125':[0.056,0,0.043,0.062,0.16],
                  '20210213':[0.9,0.56,0.32,0.8,0],
                  '20210217':[0.7,0.99,0.008,0.23,0.56],
                  '20210219':[0.9,0.43,0.76,0.98,0.5]})

Expected Output:

In [2]: df
Out[2]: 
   A        B     c(Mean 20210111,..20210119 ) D(Mean of 20210120..20210129)..
0  81       21.8
1  80.09    22.04
2  83       21.8
3  85       21.7           
4  88       22.06
ThePyGuy
  • 17,779
  • 5
  • 18
  • 45
rapabba
  • 1
  • 1
  • Try not to use pictures when a text example can be made. It's easier for us to copy and view text. Also, please read this Q&A on [making a good pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – kennyvh Apr 28 '21 at 19:01
  • Your code to build the dataframe is invalid. It should be `DataFrame` not `Dataframe`, all arrays must be the same length and they are currently not. – Henry Ecker Apr 28 '21 at 19:23
  • @HenryEcker Rectified it. Please check now! – rapabba Apr 28 '21 at 19:28
  • If you want the mean for every 10 days, your first example of your expected output should be `20210111, ...20210119` – Camilo Martinez M. Apr 28 '21 at 19:30

1 Answers1

0

One way would be to isolate the date columns from the rest of the DF. Transpose it to be able to use normal grouping operations. Then transpose back and merge to the unaffected portion of the DataFrame.

import pandas as pd

df = pd.DataFrame({'A': [81, 80.09, 83, 85, 88],
                   'B': [21.8, 22.04, 21.8, 21.7, 22.06],
                   '20210113A.2': [0, 0.05, 0, 0, 0.433],
                   '20210122B.1': [0, 0.13, 0, 0, 0.128],
                   '20210125C.3': [0.056, 0, 0.043, 0.062, 0.16],
                   '20210213': [0.9, 0.56, 0.32, 0.8, 0],
                   '20210217': [0.7, 0.99, 0.008, 0.23, 0.56],
                   '20210219': [0.9, 0.43, 0.76, 0.98, 0.5]})

# Unaffected Columns Go Here
keep_columns = ['A', 'B']

# Get All Affected Columns
new_df = df.loc[:, ~df.columns.isin(keep_columns)]

# Strip Extra Information From Column Names
new_df.columns = new_df.columns.map(lambda c: c[0:8])

# Transpose
new_df = new_df.T

# Convert index to DateTime for easy use
new_df.index = pd.to_datetime(new_df.index, format='%Y%m%d')

# Resample every 10 Days on new DT index (Drop any rows with no values)
new_df = new_df.resample("10D").mean().dropna(how='all')

# Transpose and Merge Back on DF
df = df[keep_columns].merge(new_df.T, left_index=True, right_index=True)

# For Display
print(df.to_string())

Output:

       A      B  2021-01-13 00:00:00  2021-01-23 00:00:00  2021-02-12 00:00:00
0  81.00  21.80               0.0000                0.056             0.833333
1  80.09  22.04               0.0900                0.000             0.660000
2  83.00  21.80               0.0000                0.043             0.362667
3  85.00  21.70               0.0000                0.062             0.670000
4  88.00  22.06               0.2805                0.160             0.353333

new_df = df.loc[:, ~df.columns.isin(keep_columns)]

new_df

              0     1      2      3      4
20210113  0.000  0.05  0.000  0.000  0.433
20210122  0.000  0.13  0.000  0.000  0.128
20210125  0.056  0.00  0.043  0.062  0.160
20210213  0.900  0.56  0.320  0.800  0.000
20210217  0.700  0.99  0.008  0.230  0.560
20210219  0.900  0.43  0.760  0.980  0.500

new_df.index = pd.to_datetime(new_df.index, format='%Y%m%d')

new_df

                0     1      2      3      4
2021-01-13  0.000  0.05  0.000  0.000  0.433
2021-01-22  0.000  0.13  0.000  0.000  0.128
2021-01-25  0.056  0.00  0.043  0.062  0.160
2021-02-13  0.900  0.56  0.320  0.800  0.000
2021-02-17  0.700  0.99  0.008  0.230  0.560
2021-02-19  0.900  0.43  0.760  0.980  0.500

new_df = new_df.resample("10D").mean().dropna(how='all')

new_df

                   0     1         2      3         4
2021-01-13  0.000000  0.09  0.000000  0.000  0.280500
2021-01-23  0.056000  0.00  0.043000  0.062  0.160000
2021-02-12  0.833333  0.66  0.362667  0.670  0.353333

new_df.T

   2021-01-13  2021-01-23  2021-02-12
0      0.0000       0.056    0.833333
1      0.0900       0.000    0.660000
2      0.0000       0.043    0.362667
3      0.0000       0.062    0.670000
4      0.2805       0.160    0.353333
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • how do we perform this if there is some string concatenated to date columns (Ex:20210113A.1, 20210113B.1, 20210123A.1) – rapabba Apr 28 '21 at 19:51
  • Updated my answer to include this. Please try to include relevant information like that in your question. – Henry Ecker Apr 28 '21 at 19:56