0

I have a pandas dataframe (df) of the following format:

+------+-------+-------+
| Zone | Group | Count |
+------+-------+-------+
|  897 |     1 |    78 |
|  897 |     2 |    49 |
|  897 |     3 |    23 |
|  482 |     1 |   157 |
|  482 |     2 |    57 |
|  482 |     3 |    28 |
+------+-------+-------+

I would like to alter the dateframe so that there exists only one row per Zone. The output would be...

+------+----------+----------+----------+
| Zone | Count_G1 | Count_G2 | Count_G3 |
+------+----------+----------+----------+
|  897 |       78 |       49 |       23 |
|  482 |      157 |       57 |       28 |
+------+----------+----------+----------+

In terms of generating the new column names, I think the best method would be to use some automated counter-based method. I have provided sample data, but the actual problem I am working on has hundreds of rows of data to be transformed in this manner.

The following post addresses one approach to naming new columns based on dictionaries, which would be a less than ideal approach in this case.

Renaming columns of a pandas dataframe without column names

Zero
  • 74,117
  • 18
  • 147
  • 154
HMLDude
  • 1,547
  • 7
  • 27
  • 47
  • 1
    Do you already have the group column? If so, this is just pivoting: `df.pivot(index='Zone', columns='Group', values='Count').add_prefix('Count_G')`. To get the exact same output: `df.pivot(index='Zone', columns='Group', values='Count').add_prefix('Count_G').rename_axis(None, axis=1).reset_index()`. – ayhan Sep 24 '17 at 17:38
  • We can safely assume, for the sake of discussion, that I have the first dataframe (as displayed), and my desired output is the second. – HMLDude Sep 24 '17 at 17:40
  • Or, `df.set_index(['Zone', 'Group'])['Count'].unstack('Group').add_prefix('Count_G')` – Zero Sep 24 '17 at 17:42
  • `pd.crosstab(df.Zone, df.Group, df.Count, aggfunc=sum).add_prefix('Count_G')` also. – Zero Sep 24 '17 at 17:45
  • @ayhan and @Zero, you're examples cover the given use case described above. But as I mentioned, the actual problem I am trying to solve has hundreds of columns. So instead of just breaking out the `Count' column n ways (n=3 in the example), I am working to break out a range including hundreds of such columns. – HMLDude Sep 24 '17 at 20:33

0 Answers0