0

I have 4 column dataframe:

  • PERSON ID [int - key]
  • PERSON NAME [varchar]
  • PERSON POINTS [numpy 1d array of int]
  • PERSON DISTANCES [numpy 1d array of int]

As you see the granularity is on the person. I want to create a new dataframe with a new granularity: Person_Points with also 4 columns:

  • PERSON_ID [int - key]
  • PERSON_NAME [varchar]
  • PERSON_POINT [int - key]
  • PERSON_DISTANCE [int] by creating as many rows as there is element in my 1d numpy arrays.

Exemple of PERSON dataframe:

PERSON_ID PERSON_NAME PERSON_POINTS PERSON_DISTANCES
1         A           [1 2 3]       [2 4 6]
2         B           [4 5 6 7]     [2 4 6 8]
3         C           [8]           [6]
4         D           [9 10]        [4 8]

Exemple of PERSON_Points dataframe:

PERSON_ID PERSON_NAME PERSON_POINT PERSON_DISTANCE
1         A           1             2
1         A           2             4
1         A           3             6
2         B           4             2
2         B           5             4
2         B           6             6
2         B           7             8
3         C           8             6
4         D           9             4
4         D           10            8

As my dataframe Person is very large , I am not sure of the best way to do it, and can't find many exemples. Can someone help here? Thanks in advance.

1 Answers1

0

You can use explode as Bruno pointed out

# new df with one row per element in PERSON_POINTS and PERSON_DISTANCES
df = df.explode('PERSON_POINTS')
df = df.explode('PERSON_DISTANCES')

print(df)

Above is a short example to do it

Kartik Shandilya
  • 3,796
  • 5
  • 24
  • 42