0

I am new to Python, and I think I may have gotten over my head with a project I started. I have an original dataset that comes in the form of an Excel spreadsheet that I'm importing as a dataframe. Each line of the dataframe is one status change from an individual transaction number, so each request makes up several lines of the data. I want to create a new dataframe that has 1 line for each transaction, with the columns being the time in the different statuses. Also, there may be multiple lines for the same status, which I would like to add together for a total. I'll Show an example input:

df Tran num Status Sub Status Category Time in Status
0 1 Ready NaN Cats 1.5
1 1 Set Set1 Cats 23
2 1 Set Set2 Cats 20
3 1 Go NaN Cats 14
4 2 OnMark NaN Dogs 3.5
5 2 Getset GS 1 Dogs 25
6 2 OnMark NaN Dogs 2.5
7 2 Getset GS 1 Dogs 22
8 2 Getset GS 2 Dogs 15
9 2 Getset GS 3 Dogs 12
10 2 Go NaN Dogs 18

Desired Output:

df Tran num Category T in Ready T in Set T in Go T in Set1 T in Set2 T in OnMark T in Getset
0 1 Cats 1.5 43 14 23 20 NaN NaN
1 2 Dogs NaN NaN 18 NaN NaN 2.5 49

I'm not sure why, but the table looks fine in the editor, but not when I post. Screenshot

A couple of notes: I left out the sub-statuses for Tran num 2 for space, but would like that included. Basically any new status that comes up, I'd like to create a new column. Also, the dataset is roughly 300k+ lines.

Chandan
  • 11,465
  • 1
  • 6
  • 25
VenomVol
  • 1
  • 1

1 Answers1

0
df.pivot(columns = 'Status', values = 'Time in Status')
VasilisG
  • 16
  • 1
  • While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. – ppwater Jan 13 '21 at 01:09