0

I have a data frame of extracted forum posts, and I need to differentiate the initial post from the replies. Ideally this would be done in the form of creating a new column where a 1 represents the first instance of each thread_name, and then each of the following posts with a matching string in the thread_name column would be represented by a number counting up from one.

My data frame looks something like this:

user_name thread_name post_text
user 1 thread 1 ....
user 2 thread 1 .....
user 3 thread 1 ....
user 4 thread 2 ....
user 5 thread 2 ....
user 6 thread 2 ....
user 7 thread 2 ....

thanks in advance!

Connor95
  • 97
  • 4

1 Answers1

1

You can use cumcount over grouping by thread_name:

df['post_number'] = df.groupby('thread_name').cumcount() + 1
Guru Stron
  • 102,774
  • 10
  • 95
  • 132