0

I have a dataset which has data recorded in time-frames:

| ID  | ITEM      | TIME                  | AMOUNT |
| --- | --------- | --------------------- | ------ |
| 1   | 5000      |10/12/2101  6:17:00 PM | 10     |
| 1   | 5200      |10/12/2101  6:17:00 PM | 2      |
| 1   | 5100      |10/12/2101  6:17:00 PM | 1.70   |
| 1   | 5300      |10/13/2101  3:00:00 AM | 1.52   |
| 1   | 5000      |10/13/2101  3:00:00 AM | 11     |

I am looking to convert this format to column-wise with the item numbers as column names with their corresponding value in each time. I created an empty data frame with all of the item numbers first and I am running a for loop with each time-stamp to get the item and amount at that time and append it as a dictionary to the empty dataset.

The problem is that this produces a data frame which looks like this

| ID  | TIME                   | 5000   | 5100  | 5200  | 5300  |
| --- | ---------------------- | -------|-------|-------|-------|
| 1   | 10/12/2101  6:17:00 PM | 10     |       |       |       |
| 1   | 10/12/2101  6:17:00 PM |        | 1.70  |       |       |
| 1   | 10/12/2101  6:17:00 PM |        |       | 2     |       |
| 1   | 10/13/2101  3:00:00 AM | 11     |       |       |       |
| 1   | 10/13/2101  3:00:00 AM |        |       |       | 1.52  |

But I want to have only one row for each time-stamp and fill the item columns for that particular time-stamp. For Example:

| ID  | TIME                   | 5000   | 5100  | 5200  | 5300  |
| --- | ---------------------- | -------|-------|-------|-------|
| 1   | 10/12/2101  6:17:00 PM | 10     | 1.70  | 2     |       |
| 1   | 10/13/2101  3:00:00 AM | 10     |       |       | 1.52  |

Here is my code:

# Create an empty dataframe of the selected items
df_to_fill = pd.DataFrame(columns=['ID', 'TIME', '5000', '5100', '5200', '5300'])

# Find list of time-frames
list_time_frames = df['TIME'].unique()

# For each time-frame
for _ in list_time_frames:
    # For item(s) in each time-frame
    for i, r in df.iterrows():
        if r['time'] == _:
            dict_ = {'TIME': _, 'ID': r['ID'], str(r['ITEM']): r['AMOUNT']}
            df_to_fill = df_to_fill.append(dict_, ignore_index=True)
LoneWolf
  • 79
  • 6
  • yes, ```pivot_table``` was what I was missing. I guess the problem with my way is that each time it made a row in the empty data frame but ```pivot_table``` can do it in one step. – LoneWolf Jul 18 '20 at 00:11
  • no problem @LoneWolf . That comment is automated by StackOverFlow when someone flags a question as a duplicate for the purpose of there being less StackOverFlow clutter as this is a common question.There is nothing bad about asking a question that becomes a duplicate. I'm just letting you know. Now that it has been accepted that comment was deleted. – David Erickson Jul 18 '20 at 00:13

1 Answers1

0

Let us try pivot

s=df.pivot_table(index=['ID','TIME'], columns='ITEM',values='AMOUNT',aggfunc='sum').reset_index()
BENY
  • 317,841
  • 20
  • 164
  • 234
  • wow! that eliminated the use of for loops. This is what I was looking for. Can you explain how this works? Does it produce one row for each time because you used ```aggfunc='sum'```? – LoneWolf Jul 17 '20 at 23:48
  • 1
    @LoneWolf yes that is correct ~ you can check pivot_table with pandas , the offical doc itself explain it pretty well – BENY Jul 17 '20 at 23:49
  • 1
    For the provided sample data, you actually could exclude `,aggfunc='sum'` as there are no duplicate rows, but it's good to do just in case if you get duplicates. – David Erickson Jul 18 '20 at 00:00