0

I'm trying to manipulate a dataframe in Pandas and running into some issues. I've looked at some variants of the questions that have been asked here and most of them involve using pivot and discarding some of the existing columns, I'm wondering if there's a way around that.

I created some simple data as an illustration that's similar to my existing data:

import pandas as pd

raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
            'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
            'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
            'Month': ["November", "November", "November", "December","December", "December"], 
              'Sales': [100, 150, 275, 200, 150, 150]}

frame = pd.DataFrame(raw_data, columns =raw_data.keys())

This produces a dataframe that looks like this:

OutputFrame What I want to do, is turn the months into columns, WHILE preserving the other data. So something like this:DesiredFrame

I've tried the suggestions to pivot from here: Pandas long to wide reshape, by two variables

I tried pivoting on months:

frame.pivot(columns = 'Month')

Fail1

I tried adding more columns to see if it would clean up:

frame.pivot(columns = ('FirstName', 'LastName','Month'), values = 'Sales' )

Fail2

In both cases I got some sort bizzare columns. I'm very curious to know what Pandas is doing here, but I have no idea how to make sense of this.

I figure I could loop through and just re-create the data, but I figure this has to be a better way?

  • All the images look the same. So it's hard to determine the desired output. Have you looked into passing other arguments, such as `frame.pivot(index=["FirstName","LastName"],columns = 'Month')`? Or using `frame.pivot_table(...)`? – Murilo Cunha Jan 06 '22 at 21:00
  • Oh my god I'm sorry, I'll edit the question – Pewter City Jan 06 '22 at 21:06

1 Answers1

3

You were actually almost all the way there with pivot(). Specifying the index will take you almost all the way there:

import pandas as pd

raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
            'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
            'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
            'Month': ["November", "November", "November", "December","December", "December"], 
            'Sales': [100, 150, 275, 200, 150, 150]}

frame = pd.DataFrame(raw_data, columns =raw_data.keys())

df = frame.pivot(
    index=["FirstName", "LastName", "Building"],
    columns="Month",
    values="Sales",
)

df

The only difference is that you will have a multi-level index in your dataframe. If you want to get exactly the desired output, you'd need to collapse the multi-index and rename the index (you can chain them as well)

import pandas as pd

raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
            'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
            'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
            'Month': ["November", "November", "November", "December","December", "December"], 
            'Sales': [100, 150, 275, 200, 150, 150]}

frame = pd.DataFrame(raw_data, columns =raw_data.keys())

df = (
    frame.pivot(
        index=["FirstName", "LastName", "Building"],
        columns="Month",
        values="Sales"
    )
    .reset_index()              # collapses multi-index
    .rename_axis(None, axis=1)  # renames index
)

df
Murilo Cunha
  • 436
  • 5
  • 9
  • Thanks so much for your response. Your solution works perfectly, I just have a few questions. What exactly is the index doing? I had thought indexing was used purely as numbering for rows. But here I'm not following why we feed it 3 rows? – Pewter City Jan 07 '22 at 17:41
  • I meant 3* columns not rows. I suppose I just misunderstand what index is supposed to be – Pewter City Jan 07 '22 at 17:52
  • You're welcome! Would you mind accepting my answer if you found it helpful? Now, on the index - it's really just a way to id the rows. It makes sense if they are unique numbers (tho they don't have to). When you pass the `index` property to `pivot()` you're really specifying how you want id the rows. When pivoting, the row ids will be unique. So you can think of "for each FirstName, LastName and Building (row) I want the Sales (value) for each Month (columns)". Also check the [pandas docs for indexes](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html) – Murilo Cunha Jan 07 '22 at 21:22
  • Hi, I'm very sorry for not accepting it right away, I was away from this computer for a while. I really appreciate your help, I never fully understood the way indexing worked, but your examples make it a lot clearer. Thank you very much! – Pewter City Jan 10 '22 at 21:01