0

I am new to Python Pandas and am looking for advice on how to proceed with a problem. I have a large dataframe and I would like to sum the columns of the rows by 50 at a time to form new rows with one column being the mean of a time series. I worded that poorly, but here is an example:

Say I have this dataframe, where A is an increasing time series:

          A     B     C     D
       0  1     23    45    21 
       1  2     34    23    65
       2  3     56    84    35
       3  4     67    20    70

I'd like to "collapse" the rows by 2 rows so that they look like this, with columns B, C, and D summed and column A being the mean:

           A     B     C     D
       0  1.5   57    68    86
       1  3.5   123   104   110

I have looked into using cut() and groupby(), but neither of these seem to do what I want.

Edit: For anybody also looking for this, I found it easiest to treat it as a resampling based on the index values. The solution for my particular problem was this:

df = df.groupby(df.index // 50).sum()

Then I found the mean by dividing the column I needed to by 50.

ccsucic
  • 119
  • 1
  • 1
  • 10

1 Answers1

0

You just need to add a key column for groupby to the dataframe that corresponds to your needs.

Here you want to have each ncollapsed (some number) rows collapsed together, so you can add a column obtained from the integer division of a range by ncollapsed to get same number each ncollapsed rows.

import numpy as np
import pandas as pd

# some random data
df = pd.DataFrame(np.random.randint(low=0, high=100, size=(5,5)))
ncollapsed = 2
# add column with same values each 'ncollapsed' consecutive rows
df["index_collapsed"] = np.arange(len(df)) // ncollapsed  
# Now collapse using groupby and the aggregation you want
df.groupby("index_collapsed").mean()

Result:

    0   1   2   3   4
0  73  95  93   8  73
1  27  68  77  21  90
2  89  32  65  73  49
3  25  79  11  73  66
4  81  72  33  66  16


                    0     1     2     3     4
index_collapsed                              
0                50.0  81.5  85.0  14.5  81.5
1                57.0  55.5  38.0  73.0  57.5
2                81.0  72.0  33.0  66.0  16.0

Note: The last row has not been collapsed because the number of row is not a multipleof ncollapsed

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Nick Skywalker
  • 1,027
  • 2
  • 10
  • 26