0

I need to transform the below Dataframe into the required format without using a loop(or any other inefficient logic) as the size of dataframe is huge i.e., 950 thousand rows and also the value in the Points column has a list with lengths more than 1000. I'm getting this data after de-serializing a blob data from the database and will need to use this data create some ML Models.

input: Dataframe Before

output: Dataframe After

    for index,val in df.iterrows():
        tempDF = pd.DataFrame(                         
            [[
                df['I'][index],df['x'][index],
                df['y'][index],df['points'][index],
                                  
            ]]* int(df['points'][index]))
        tempDF["Data"] = df['data'][index]
        tempDF["index"] = list(range(1,int(df['k'][index])+1))
        FinalDF = FinalDF.append(tempDF, ignore_index = True)

I have tried using for loop but for 950 thousand rows it takes so much time that using that logic is just not feasible. please help me in finding a pandas logic or if not then some other method to do that.

*I had to post screenshot because i was unable to post the dataframe with a table. Sorry I'm new to stackoverflow.

mozway
  • 194,879
  • 13
  • 39
  • 75
Tushar
  • 1

1 Answers1

0

explode:

df.explode('points')
Piotr Żak
  • 2,046
  • 5
  • 18
  • 30