0

I'm parsing some data from a CSV into a proper SQL database.

Let's say I have

PersonId  PersonName   TripId

21          Joe        634
34          Hannah     853
34          Hannah     952
54          Ronnie     153

As a dataframe in pandas. But I'm turning this data into a Person and Trip table in SQL. So, I'm making the relation that a Person (primary key Name and Id) is able to travel 0 or more times. So, I need to drop based on PersonId and PersonName, and keep the TripId as it is going to be my external key. Once It's done, my result would be:

PersonId  PersonName   TripId

21          Joe        634
34          Hannah     853, 952
54          Ronnie     153

So, Hannah would not be repeated. How do I do this?

  • 1
    `df.groupby(['PersonId', 'PersonName'], as_index=False)['TripId'].agg(','.join)` should do the trick. If TripId is a column of ints, then cast it first: `.agg(lambda x: ','.join(x.astype(str)))` – cs95 Apr 26 '23 at 20:06
  • Hey @cs95, sorry to bother ya, but that answer did not work out in my code. Like, I checked Pandas documentation and It makes total sense (thanks for pointing It out to me) but once coding It, nothing happened. The dataset on my original question is an oversimplification, my real pandas dataframe has other columns too. Tried doing the same thing you showed, but using ".first()" after the groupby command, to keep the first occurrence of the other columns data, but nothing happened. You have any idea why? Since your comment sounds like exactly what I asked for, maybe you would know I guess. – viceconsul_tecktips Apr 27 '23 at 13:04
  • can you explain why it did not work? what was the error, etc. Does the real data have nans? – cs95 Apr 27 '23 at 13:28

0 Answers0