0

My goal is to aggregate a pandas DataFrame, grouping rows by an identity field. Notably, rather than just gathering summary statistics of the group, I want to retain all the information in the DataFrame in addition to summary statistics like mean, std, etc. I have performed this transformation via a lot of iteration, but I am looking for a cleaner/more pythonic approach. Notably, there may be more or less than 2 replicates per group, but all groups will always have the same number of replicates.

Example: I would llke to translate the below format

df = pd.DataFrame([
    ["group1", 4, 10],
    ["group1", 8, 20],
    ["group2", 6, 30],
    ["group2", 12, 40],
    ["group3", 1, 50],
    ["group3", 3, 60]], 
  columns=['group','timeA', 'timeB'])

print(df)

    group  timeA  timeB
0  group1      4     10
1  group1      8     20
2  group2      6     30
3  group2     12     40
4  group3      1     50
5  group3      3     60

into a df of the following format:

target = pd.DataFrame([
    ["group1", 4, 8, 6, 10, 20, 15],
    ["group2", 6, 12, 9, 30, 45, 35],
    ["group3", 1, 3, 2, 50, 60, 55]
], columns = ["group", "timeA.1", "timeA.2", "timeA.mean", "timeB.1", "timeB.2", "timeB.mean"])

print(target)

    group  timeA.1  timeA.2  timeA.mean  timeB.1  timeB.2  timeB.mean
0  group1        4        8           6       10       20          15
1  group2        6       12           9       30       45          35
2  group3        1        3           2       50       60          55

Finally, it doesn't really matter what the column names are, these ones are just to make the example more clear. Thanks!

EDIT: As suggested by a user in the comments, I tried the solution from the linked Q/A without success:

df.insert(0, 'count', df.groupby('group').cumcount())
df.pivot(*df)

TypeError: pivot() takes from 1 to 4 positional arguments but 5 were given
user11058068
  • 153
  • 1
  • 6
  • 1
    Try the solution for Q/A 10 and 11 in the dup link. – Quang Hoang Nov 30 '20 at 19:49
  • Thanks for looking at my question. Unfortunately, the answer at Q/A 10 + 11 does not work because pivot only takes up to 4 positional arguments. – user11058068 Nov 30 '20 at 20:15
  • Please update your question with what you tried and what failed. I'll open the question. – Quang Hoang Nov 30 '20 at 20:17
  • It is not very clear which transformations you want to do but if you want to retain the structure, usually groupby + transform can work quite well https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html – Nabla Nov 30 '20 at 20:26

1 Answers1

1

Try with pivot_table:

out = (df.assign(col=df.groupby('group').cumcount()+1)
   .pivot_table(index='group', columns='col', 
                margins='mean', margins_name='mean')
   .drop('mean')
)
out.columns = [f'{x}.{y}' for x,y in out.columns]

Output:

        timeA.1  timeA.2  timeA.mean  timeB.1  timeB.2  timeB.mean
group                                                             
group1      4.0      8.0         6.0       10       20          15
group2      6.0     12.0         9.0       30       40          35
group3      1.0      3.0         2.0       50       60          55
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74