0

so I have a csv file as below but with 1300 columns

id Date A A B B B B C C C
A 2009 Y Y
B 2009 Y Y Y Y
C 2011 Y Y Y Y
D 2013 Y Y Y
E 2015 Y Y Y
F 2017 Y Y Y

I want to combine the column values of all columns that have the same column name to look like below:

id Date A B C
A 2009 Y Y
B 2009 Y,Y Y,Y,Y
C 2011 Y,Y Y,Y
D 2013 Y Y,Y
E 2015 Y Y,Y
F 2017 Y Y,Y

I've been searching on how to combine columns with same column names in excel, r and python, but most of the samples either involve summing up the numerical values instead of concatenating strings or have duplicate rows instead of columns.

I've tried the following code I saw on Stackoverflow

pd.concat(x for _, x in df.groupby(df.columns.duplicated(), axis=1))

But for some reason, I get this output:

id Date A A.1 B B.1 B.2 B.3 C C.1 C.2
A 2009 Y Y
B 2009 Y Y Y Y
C 2011 Y Y Y Y
D 2013 Y Y Y
E 2015 Y Y Y
F 2017 Y Y Y

I feel like this should be easy, but I can't seem to find a solution around this.

cms72
  • 177
  • 10
  • 1
    Can you please provide the input data in a form that's easier to create a df from? Setting up the df takes longer than to answer the question. – timgeb Mar 24 '21 at 11:13
  • 2
    Also, does [this](https://stackoverflow.com/questions/24390645/python-pandas-merge-samed-name-columns-in-a-dataframe) answer your question? – timgeb Mar 24 '21 at 11:15
  • @timgeb - not really - because I just realised that pandas changes the column name (hence the A.1, A.2, B.1, B.2) to make it unique - which I then have to revert the column names back to old names (that's 1300 of them). Is there a more efficient way to do this? – cms72 Mar 24 '21 at 13:17
  • Thanks @Pygirl for making them into tables. I didn't know how to do that. – cms72 Mar 24 '21 at 13:17
  • 1
    Use `df.columns = df.columns.str.split('.').str[0]` first for remove values after `.` – jezrael Mar 24 '21 at 13:19
  • 1
    @jezrael that worked. I think I was going about it the long way. Thank you both! – cms72 Mar 24 '21 at 13:22
  • Why is this tagged with R and Excel if it's only about Python? – camille Mar 24 '21 at 21:02
  • Hi @camille, because I was also trying to filter the data using R dplyr and excel consolidate. Nothing was working at the time - so I just tagged all 3. – cms72 Mar 25 '21 at 02:34

0 Answers0