I have a Pandas DataFrame shown below consisting of three columns.
import pandas as pd
data = [[1, "User1", "Hello."], [1, "User1", "How are you?"], [1, "User2", "I'm fine."], [2, "User1", "Nice to meet you."], [2, "User2", "Hello."], [2, "User2", "I'm happy."], [2, "User2", "Goodbye."], [3, "User2", "Hello."]]
df = pd.DataFrame(data, columns=['Conversation', 'User', 'Text'])
Conversation User Text
0 1 User1 Hello.
1 1 User1 How are you?
2 1 User2 I'm fine.
3 2 User1 Nice to meet you.
4 2 User2 Hello.
5 2 User2 I'm happy.
6 2 User2 Goodbye.
7 3 User2 Hello.
I would like to merge the Text of groups of consecutive Users, but not over conversation boundaries. If in a Conversation a User has several consecutive rows, I would like to merge these rows into one row by combining the Text with whitespace. When a new Conversation starts, it should not be combined. For the example, the result should look as follows:
Conversation User Text
0 1 User1 Hello. How are you?
2 1 User2 I'm fine.
3 2 User1 Nice to meet you.
4 2 User2 Hello. I'm happy. Goodbye.
7 3 User2 Hello.
How can this be achieved in an efficient way (I have a big DataFrame)?