merge dataframe with the same columns name

Question

Hi i have a dataframe that looks like that :

Unnamed: 0	X1	Unnamed: 1	X2	Unnamed: 1	X3	Unnamed: 2	X4
1970-01-31	5.0	1970-01-31	1.0	1970-01-31	1.0	1980-01-30	1.0
1970-02-26	6.0	1970-02-26	3.0	1970-02-26	3.0	1980-02-26	3.0

I have many columns (631) that looks like that.

I would like to have :

date	X1	X2	X3	X4
1970-01-31	5.0	1.0	1.0	na
1970-02-26	6.0	3.0	3.0	na
1980-01-30	na	na	na	1.0
1980-02-26	na	na	na	3.0

I tried :

res_df = pd.concat(
    df2[[date, X]].rename(columns={date: "date"}) for date, X in zip(df2.columns[::2], 
    df2.columns[1::2])
    ).pivot_table(index="date")

It works for small data but do not work for mine. Maybe because I have the same columns name 'Unnamed: 1' in my df. I have a message error:

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

where is the data sourced from? you have duplicate column names there. It might be possible to get a cleaner solution, if we can control/influence how the input dataframe is created — sammywemmy, Aug 17 '22 at 12:12
It is from an excel file. I read multiple worksheets with instruction : > df = pd.concat(pd.read_excel(workbook_url, sheet_name=None, skiprows=[0]), axis=1).droplevel(0, axis=1) — Jacques Tebeka, Aug 17 '22 at 16:30

jezrael · Answer 1 · 2022-08-18T06:14:16.817

Crete index by date varible and use axis=1 in concat:

res_df = (pd.concat((df2[[date, X]].set_index(date) 
                     for date, X in zip(df2.columns[::2],  df2.columns[1::2])), axis=1)
            .rename_axis('date')
            .reset_index())
print (res_df)
         date   X1   X2   X3   X4
0  1970-01-31  5.0  1.0  1.0  NaN
1  1970-02-26  6.0  3.0  3.0  NaN
2  1980-01-30  NaN  NaN  NaN  1.0
3  1980-02-26  NaN  NaN  NaN  3.0

EDIT: Error seems like duplicated columns names in your DataFrame, possible solution is deduplicated before apply solution above:

df = pd.DataFrame(columns=['a','a','b'], index=[0])

#you can test if duplicated columns names
print (df.columns[df.columns.duplicated(keep=False)])
Index(['a', 'a'], dtype='object')

#https://stackoverflow.com/a/43792894/2901002
df.columns = pd.io.parsers.ParserBase({'names':df.columns})._maybe_dedup_names(df.columns)
print (df.columns)
Index(['a', 'a.1', 'b'], dtype='object')

Still an error which is : ValueError: Buffer has wrong number of dimensions (expected 1, got 2) — Jacques Tebeka, Aug 17 '22 at 11:27
It still works when I do not have the same columns name but not where I have the same one ! — Jacques Tebeka, Aug 17 '22 at 11:43

merge dataframe with the same columns name

1 Answers1