0

My data looks like this:

    Id           Timestamp         Data    Group_Id
0    1     2018-01-01 00:00:05.523 40      1
1    2     2018-01-01 00:00:10.757 31.5    1
2    3     2018-01-01 00:00:15.507 25.5    1
                   ...
3    4     2018-01-01 00:00:05.743 35      2
4    5     2018-01-01 00:00:10.407 35.75   2
                   ...

I want to group data by Group_Id so that each unique value in Group_Id is one row, while transposing the timestamps into a time series with each timestamp in a column like below:

            x0      x1      x2      x3      x4      x5      ...
1           40      31.05   25.5    25.5    25.5    25      ...
2           35      35.75   36.5    36.5    36.5    36.5    ...
⋮           ⋮       ⋮        ⋮       ⋮       ⋮        ⋮

Then I want to rename the timestamps according to time order, ie. the x0 replacing the first timestamp, x1 the second ...

I checked the df.pivot_table documentation and couldn't find a relevant function that sorts time series. And discussion on this link did not help me with this problem.

martineau
  • 119,623
  • 25
  • 170
  • 301
nilsinelabore
  • 4,143
  • 17
  • 65
  • 122
  • You were reading the correct solution. Check out question 10 in the discussion. In your case, `Data` is `B`, `Group_Id` is `A`. – Quang Hoang Oct 14 '19 at 22:41
  • I checked question 10. Why do they have to insert 0 here? `df2.insert(0, 'count', df2.groupby('A').cumcount())` – nilsinelabore Oct 14 '19 at 22:53
  • Which part of the code sort `B` (ie. `Data`)? Thanks – nilsinelabore Oct 14 '19 at 22:56
  • 1. Insert syntax: `df.insert(location, name, values)`. You don't even need `insert`. You can make do with just `df['count'] = df.groupby('Group_Id').cumcount()`. 2. The question implicitly says you need to do `df.pivot(index='Group_Id', columns='count', values='Data')`. – Quang Hoang Oct 14 '19 at 23:27
  • Use this: `new_df=df.copy() new_df['Timestamp']='x'+new_df.groupby('Group_Id')['Timestamp'].cumcount().astype(str) new_df.pivot_table(index='Group_Id',columns='Timestamp',values='Data')` – ansev Oct 14 '19 at 23:34
  • then if you want.... `new_df.columns.name=None` – ansev Oct 14 '19 at 23:36
  • Thanks @ansev. may I know what `new_df.columns.name=None` does here? I ran and compare the code it didn't seem to bring any change – nilsinelabore Oct 14 '19 at 23:48
  • remove Timestamp as the name of the columns after pivoting – ansev Oct 14 '19 at 23:53
  • @ansev Checked my code again and still don't see any change:/ – nilsinelabore Oct 14 '19 at 23:56
  • If this doesn't matter to you, just don't use it. – ansev Oct 14 '19 at 23:57
  • you should see that the name 'Timestamp' disappears at the top left of the dataframe – ansev Oct 14 '19 at 23:58
  • @ansev Yeah that is what I wanted to do, but it refuses to disappear... – nilsinelabore Oct 15 '19 at 00:01
  • @ansev And some values are `NaN`, any idea what could be the reason? – nilsinelabore Oct 15 '19 at 00:03
  • @ansev I notice that pivoting distorts the order which gives `Timestamp x0 x1 x10 x100 x1000 x1001 x1002 x1003 x1004 x1005 x1006 x1007 x1008 x1009 x101 x1010 x1011 x1012 ...` for the column headings – nilsinelabore Oct 15 '19 at 00:08
  • There are larger groups than others, it is normal for NaN... – ansev Oct 15 '19 at 00:14
  • @ansev Yeah no worries it's sorted, but the order of indexing is still confusing... – nilsinelabore Oct 15 '19 at 00:21
  • @nilsinelabore - Can you test `df['Timestamp']=df.groupby('Group_Id')['Timestamp'].cumcount() df = df.pivot(index='Group_Id',columns='Timestamp',values='Data').add_prefix('x')` ? – jezrael Oct 15 '19 at 09:10

0 Answers0