0

Problem

I want to pivot and group values. Despite looking through the guidance here, I cannot seem to find what I am looking for.

Reprex

What I have.

import pandas as pd

data_input = {'Area':['North', 'South', 'West', 'East','North', 'South', 'West', 'East'], 
              "Job1":["T", "F", "T", "X","T", "F", "T", "X"],
              "Job2":["F", "X", "T", "X","T", "F", "T", "X"],
              "Job3":["T", "F", "T", "X","X", "X", "F", "T"]}
 
# Create DataFrame
df1 = pd.DataFrame(data_input)
 
# Print the output.
print(df1)

enter image description here

What I want

# multi-level columns
items = pd.MultiIndex.from_tuples([('Job1', 'T'),('Job1', 'F'), ('Job1', 'X'),
                                 ('Job2', 'T'),('Job2', 'F'), ('Job2', 'X')])

# creating a DataFrame
dataFrame = pd.DataFrame([[2, 0, 0, 1, 1, 0], 
                          [0, 2, 0,0, 1, 1], 
                          [2, 0, 0,2, 0, 0], 
                          [0, 0, 2,0, 0, 2]],
                         index=['North', 'South', 'East', "West"],
                         columns=items)

# DataFrame
dataFrame

enter image description here

Anentropic
  • 32,188
  • 12
  • 99
  • 147
Namra
  • 359
  • 1
  • 10

1 Answers1

1

One option is to pivot each Job column and concat the result

cols = ['Job1', 'Job2', 'Job3']
dfs = []

for col in cols:
    df = pd.crosstab(df1['Area'], df1[col])
    df.columns = pd.MultiIndex.from_product([[col], df.columns.tolist()])
    dfs.append(df)

out = pd.concat(dfs, axis=1)
print(out)

      Job1       Job2       Job3
         F  T  X    F  T  X    F  T  X
Area
East     0  0  2    0  0  2    0  1  1
North    0  2  0    1  1  0    0  1  1
South    2  0  0    1  0  1    1  0  1
West     0  2  0    0  2  0    1  1  0
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
  • This is great, thank you. Do you know what I could do if none of the areas had a value for X under, say, Job1. Currently, a separate column for X would simply not be shown. Is there a way to have a column of 0s? – Namra Jul 21 '22 at 16:12
  • 1
    @Namra Try replacing `df.columns.tolist()` with those F, T, X as list – Ynjxsjmh Jul 21 '22 at 16:18