I have a data frame of extracted forum posts, and I need to differentiate the initial post from the replies. Ideally this would be done in the form of creating a new column where a 1 represents the first instance of each thread_name, and then each of the following posts with a matching string in the thread_name column would be represented by a number counting up from one.
My data frame looks something like this:
user_name | thread_name | post_text |
---|---|---|
user 1 | thread 1 | .... |
user 2 | thread 1 | ..... |
user 3 | thread 1 | .... |
user 4 | thread 2 | .... |
user 5 | thread 2 | .... |
user 6 | thread 2 | .... |
user 7 | thread 2 | .... |
thanks in advance!