I have a dataframe as below:
+--------------------+--------------------+
| _id| statement|
+--------------------+--------------------+
| 1| ssssssss|
| 2| ssssssss|
| 3| aaaaaaaa|
| 4| aaaaaaaa|
+--------------------+--------------------+
After using df.dropDuplicates(['statement']), I got this:
+--------------------+--------------------+
| _id| statement|
+--------------------+--------------------+
| 1| ssssssss|
| 3| aaaaaaaa|
+--------------------+--------------------+
But actually, I want to keep the _id value as below:
+--------------------+--------------------+
| _id| statement|
+--------------------+--------------------+
| 1, 2| ssssssss|
| 3, 4| aaaaaaaa|
+--------------------+--------------------+
How could I do?