I have a CSV file that contains rows with some similar id's. I found a nice approach doing this with dataframe and I found the code doing this from [this] 1 post.
Sample CSV file:
id messages
0 11 I am not driving home
1 11 Please pick me up
2 11 I don't have money
3 103 The car already park
4 103 No need for ticket
5 104 I will buy a car
6 104 I will buy a car
Desired output is:
Sample CSV file:
id messages
011 I am not driving home Please pick me up I don't have money
103 The car already park No need for ticket
104 I will buy a car
Now the code that I have so far is:
aggregation_functions = {'message':'sum'}
df_new = df.groupby(df['id']).aggregate(aggregation_functions)
Now what I am getting with this code is:
id messages
011 I am not driving homePlease pick me upI don't have money
103 The car already parkNo need for ticket
104 I will buy a car
I just want to have the space between words (eg. "homePlease" > "home Please") and avoid redundancy such as having two times of I will buy a car
.
I already checked the post 2 but I couldn't find my answer.
Also do i need to use .reindex(columns=df.columns)
after the aggregate(aggregation_functions)
Like:
df_new = df.groupby(df['id']).aggregate(aggregation_functions).reindex(columns=df.columns)