How to create a new dataframe from multiple lines of another dataframe

Question

I am new to Python, and I think I may have gotten over my head with a project I started. I have an original dataset that comes in the form of an Excel spreadsheet that I'm importing as a dataframe. Each line of the dataframe is one status change from an individual transaction number, so each request makes up several lines of the data. I want to create a new dataframe that has 1 line for each transaction, with the columns being the time in the different statuses. Also, there may be multiple lines for the same status, which I would like to add together for a total. I'll Show an example input:

df	Tran num	Status	Sub Status	Category	Time in Status
0	1	Ready	NaN	Cats	1.5
1	1	Set	Set1	Cats	23
2	1	Set	Set2	Cats	20
3	1	Go	NaN	Cats	14
4	2	OnMark	NaN	Dogs	3.5
5	2	Getset	GS 1	Dogs	25
6	2	OnMark	NaN	Dogs	2.5
7	2	Getset	GS 1	Dogs	22
8	2	Getset	GS 2	Dogs	15
9	2	Getset	GS 3	Dogs	12
10	2	Go	NaN	Dogs	18

Desired Output:

df	Tran num	Category	T in Ready	T in Set	T in Go	T in Set1	T in Set2	T in OnMark	T in Getset
0	1	Cats	1.5	43	14	23	20	NaN	NaN
1	2	Dogs	NaN	NaN	18	NaN	NaN	2.5	49

I'm not sure why, but the table looks fine in the editor, but not when I post. Screenshot

A couple of notes: I left out the sub-statuses for Tran num 2 for space, but would like that included. Basically any new status that comes up, I'd like to create a new column. Also, the dataset is roughly 300k+ lines.

score 0 · Answer 1 · answered Jan 12 '21 at 14:03

0

df.pivot(columns = 'Status', values = 'Time in Status')

answered Jan 12 '21 at 14:03

VasilisG

16
1

While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. – ppwater Jan 13 '21 at 01:09

How to create a new dataframe from multiple lines of another dataframe

1 Answers1