0

I'm still relatively new to Pandas and I can't tell which of the functions I'm best off using to get to my answer. I have looked at pivot, pivot_table, group_by and aggregate but I can't seem to get it to do what I require. Quite possibly user error, for which I apologise!

I have data like this:

Original Data

Code to create df:

import pandas as pd
df = pd.DataFrame([
    ['1', '1', 'A', 3, 7],
    ['1', '1', 'B', 2, 9],
    ['1', '1', 'C', 2, 9],
    ['1', '2', 'A', 4, 10],
    ['1', '2', 'B', 4, 0],
    ['1', '2', 'C', 9, 8],
    ['2', '1', 'A', 3, 8],
    ['2', '1', 'B', 10, 4],
    ['2', '1', 'C', 0, 1],
    ['2', '2', 'A', 1, 6],
    ['2', '2', 'B', 10, 2],
    ['2', '2', 'C', 10, 3]
], columns = ['Field1', 'Field2', 'Type', 'Price1', 'Price2'])
print(df)

I am trying to get data like this:

Pivoted Data

Although my end goal will be to end up with one column for A, one for B and one for C. As A will use Price1 and B & C will use Price2.

I don't want to necessarily get the max or min or average or sum of the Price as theoretically (although unlikely) there could be two different Price1's for the same Fields & Type.

What's the best function to use in Pandas to get to what I need?

Tim Edwards
  • 1,031
  • 1
  • 13
  • 34
  • Please share samples of the dataframes not images – yatu Jul 10 '19 at 08:35
  • I can't provide actual data as its confidential and I'm not the owner but this is the format of the data, albeit there are further tiers of Field than just 2. Or are you saying you want the same data but me to give the code to push dummy data into this format? – Tim Edwards Jul 10 '19 at 08:42
  • What we want is copyable data. Either some Python code to create and populate a dataframe (the best because we are sure of the dtypes) or at least *something* that can be copied and pasted with `pd.read_clipboard`. You should read [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Serge Ballesta Jul 10 '19 at 08:50
  • OK Thanks, I'll edit and add this shortly. – Tim Edwards Jul 10 '19 at 08:51

2 Answers2

1

use pivot_table

pd.pivot_table(df, values =['Price1', 'Price2'], index=['Field1','Field2'],columns='Type').reset_index()
Zaynul Abadin Tuhin
  • 31,407
  • 5
  • 33
  • 63
1

Use DataFrame.set_index with DataFrame.unstack for reshape - output is MultiIndex in columns, so added sorting second level by DataFrame.sort_index, flatten values and last create column from Field levels:

df1 = (df.set_index(['Field1','Field2', 'Type'])
         .unstack(fill_value=0)
         .sort_index(axis=1, level=1))
df1.columns = [f'{b}-{a}' for a, b in df1.columns]
df1 = df1.reset_index()
print (df1)
  Field1 Field2  A-Price1  A-Price2  B-Price1  B-Price2  C-Price1  C-Price2
0      1      1         3         7         2         9         2         9
1      1      2         4        10         4         0         9         8
2      2      1         3         8        10         4         0         1
3      2      2         1         6        10         2        10         3

Solution with DataFrame.pivot_table is also possible, but it aggregate values in duplicates first 3 columns with default mean function:

df2 = (df.pivot_table(index=['Field1','Field2'],
                      columns='Type',
                      values=['Price1', 'Price2'],
                      aggfunc='mean')
         .sort_index(axis=1, level=1))
df2.columns = [f'{b}-{a}' for a, b in df2.columns]
df2 = df2.reset_index()
print (df2)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks, I'm just trying them out with my data to work out which one gives me what I want, it didn't like the unstack due to duplicate data so I'm just working out if I need that duplicate in or not first. – Tim Edwards Jul 10 '19 at 09:13
  • @TimEdwards - yes, if duplicates, use second solution. – jezrael Jul 10 '19 at 09:17
  • @TimEdwards - Also `mean` should be changed to `sum`, `median` like you need. – jezrael Jul 10 '19 at 09:18
  • I think I have it working with the pivot_table bit but the part to get the columns working is giving me invalid syntax when I use the sample data. – Tim Edwards Jul 10 '19 at 09:38
  • 1
    I got the renaming columns bit to work by using ```df2.columns = ['-'.join(col).strip() for col in df2.columns.values]``` – Tim Edwards Jul 10 '19 at 10:02