When using pd.read_csv('myfile.csv', delimiter=';')
on a csv which duplicated column names, pandas mangles the duplicated columns with .1
, .2
, .#
(# is the number of the duplicated column)
My example csv looks like this:
data1 | data2 | A | B | B | C | C |
---|---|---|---|---|---|---|
abc | NaN | text1 | text2 | text3 | text4 | text5 |
def | 456 | text2 | text4 | text3 | text5 | text1 |
Data1;Data2;A;B;B;C;C
abc;;text1;text2;text3;text4;text5
def;456;text2;text4;text3;text5;text1
After import to dataframe, the duplicated columns get mangled:
This output is expected.
But I wish to combine these duplicated columns and their rows as comma-seperated strings.
So the desired output would look like: (order of columns is not important)
data1 | data2 | A | B | C |
---|---|---|---|---|
abc | 123 | text1 | text2,text3 | text4,text5 |
def | 456 | text2 | text4,text3 | text5,text1 |
How can I achieve that with pandas in python?
I found the following question when searching for the problem:
Concatenate cells into a string with separator pandas python
But I don't know how to apply the answer from that question to only those columns which are mangled.