0

I have a dataframe where I would like to increment strings based on their counts in Python

Data

id  type    date       count
aa  hi      q1 2023    2
aa  hey     q2 2023    3
bb  hi      q1 2023    2
            

Desired

id  count   type    date
aa  hi01    hi      q1 2023
aa  hi02    hi      q1 2023
aa  hey01   hey     q1 2023
aa  hey02   hey     q1 2023
aa  hey03   hey     q1 2023
bb  hi01    hi      q1 2023
bb  hi02    hi      q1 2023

Doing

I believe I have to perform a 'melt' to expand the dataset

(df.melt(id_vars=['id', 'type', 'date'], value_name='count') # reshape data
   .sort_values(by=['date', 'variable'])

A SO user suggested this which works to increment

count=lambda d: d['type']+d.groupby(['id', 'date', 'type']).cumcount().add(1).astype(str).str.zfill(2)

I am researching how to combine these, any suggestion is appreciated

mozway
  • 194,879
  • 13
  • 39
  • 75
Lynn
  • 4,292
  • 5
  • 21
  • 44
  • We seem to be doing almost the same thing again. [Expand counted row value into separate rows, adding distinct ID in python](https://stackoverflow.com/q/67529618/15497888) – Henry Ecker Nov 20 '21 at 03:49
  • Scale up the DataFrame.`df = df.reindex(index=df.index.repeat(df['count'])).reset_index(drop=True)` Then the answer from the previous question works. `df['count'] = df['type']+df.groupby(['id', 'date', 'type']).cumcount().add(1).astype(str).str.zfill(2)` – Henry Ecker Nov 20 '21 at 03:55
  • @HenryEcker I went ahead and did this df['count'] = df['type']+df.groupby(['id', 'date', 'type']).cumcount().add(1).astype(str).str.zfill(2) but its just giving count 1 - I think this question is a bit different – Lynn Nov 20 '21 at 04:01
  • Again you scaled up the DataFrame first? That's the core new component and the primary duplicate link. `df = df.reindex(index=df.index.repeat(df['count'])).reset_index(drop=True)` this happens _before_ enumerating groups. – Henry Ecker Nov 20 '21 at 04:02
  • ok thank you, sorry about that @HenryEcker it works – Lynn Nov 20 '21 at 04:04
  • 1
    No need to be sorry. Glad you got it worked out. The new issue is that your initial DataFrame is too small, so you just need to make it the correct size. Then the counting by groups will work as expected. – Henry Ecker Nov 20 '21 at 04:05

0 Answers0