1

I have a dataframe with multiple cells containing lists of multiple values. Rather than storing multiple values in a cell, I'd like to expand the data frame so that each item in the list gets its own row with the same values in all other columns. I have sample code and output attach and the goal I am trying to achieve

import pandas as pd
import numpy as np
df=pd.DataFrame({'OBJECTID':[1,2,3,4,5],
              'CRASH_KEY':[20191088662,20191088664,20191088668,20191088677,20191088678],
              'CASENUMBER':[20191088662,20191088664,20191088668,20191088677,20191088679],
              'routeId':[list(np.random.randn(3).round(2)) for i in range(5)],
              'measure':[list(np.random.randn(3).round(2)) for i in range(5)],
              'SPEED_LIMIT':[list(np.random.randn(3).round(2)) for i in range(5)],
              'distance':[list(np.random.randn(3).round(2)) for i in range(5)],})

df my output

OBJECTID    CRASH_KEY   CASENUMBER  ...                measure           SPEED_LIMIT               distance
 0         1  20191088662  20191088662  ...   [-0.24, -0.78, -0.7]   [-1.28, 0.11, -0.7]   [-0.98, 0.06, -1.07]
 1         2  20191088664  20191088664  ...   [-0.94, -0.98, 1.31]  [-1.26, 1.31, -2.24]  [-1.21, -1.21, -0.35]
 2         3  20191088668  20191088668  ...  [-0.21, -0.99, -0.25]  [-1.36, -0.59, 0.02]   [0.22, -1.36, -0.64]
 3         4  20191088677  20191088677  ...    [0.31, -0.56, 0.95]  [-0.92, -1.04, 0.59]      [1.88, 0.3, 0.95]
 4         5  20191088678  20191088679  ...    [-0.07, 1.02, 2.14]  [-0.49, -0.66, 0.07]   [-0.27, -0.12, 0.36]

This is what I am trying to achieve

         OBJECTID    CRASH_KEY   CASENUMBER  measure SPEED_LIMIT  distance
    0         1  20191088662  20191088662   -0.24      -1.28    -0.98
    1         1  20191088662  20191088662   -0.78       0.11     0.06
    2         1  20191088662  20191088662    -0.7      -0.7     -1.07
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
user13073332
  • 21
  • 1
  • 7
  • 1
    Please don't post pictures of code/DataFrame, paste them so others can just copy it and work with it. – debsim Jul 21 '20 at 13:43
  • @debsim thanks am just new here so getting my way around on how to ask questions i will surely past them – user13073332 Jul 21 '20 at 13:50
  • @debsim please can you suggest any other way for me insert a picture as am trying to past it not allowing me too – user13073332 Jul 21 '20 at 14:05
  • 1
    https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples This is a post that helps you create good examples for questions. – Scott Boston Jul 21 '20 at 14:06
  • Instead of taking a picture, just copy the text that you took the screenshot of and paste that here. – debsim Jul 21 '20 at 14:07
  • @MrNobody33.hello i saw you guy closed my question i was trying to edit it to get it right please can you reopen i was editing as i had gotten recomendation – user13073332 Jul 21 '20 at 15:39
  • 1
    @debsim is this good enough or still need to add something – user13073332 Jul 21 '20 at 15:40
  • Yes, I think your current question is good as it is now. I think it should be reopened now. – debsim Jul 21 '20 at 16:06
  • 1
    @ScottBoston thanks it really helped and i have posted my question clearly – user13073332 Jul 21 '20 at 16:14
  • 1
    Hey @user13073332 , thanks for editing it, it looks great and comprehensible now. But, now seeing it edited, it seems more like a duplicate to [this](https://stackoverflow.com/q/45846765/13676202). Because IIUC, you want to explode multiple columns, so, as the link suggests, you can try something like this: `df.set_index(['OBJECTID','CRASH_KEY','CASENUMBER']).apply(pd.Series.explode).reset_index()`. – MrNobody33 Jul 21 '20 at 16:46
  • @MrNobody33 has a good solution. – Scott Boston Jul 21 '20 at 17:16
  • @MrNobody33 once i do that and want to sort distance with min to max with how would i go about it – user13073332 Jul 22 '20 at 17:16

0 Answers0