0

Hi i have a dataframe that looks like that :

Unnamed: 0 X1 Unnamed: 1 X2 Unnamed: 1 X3 Unnamed: 2 X4
1970-01-31 5.0 1970-01-31 1.0 1970-01-31 1.0 1980-01-30 1.0
1970-02-26 6.0 1970-02-26 3.0 1970-02-26 3.0 1980-02-26 3.0

I have many columns (631) that looks like that.

I would like to have :

date X1 X2 X3 X4
1970-01-31 5.0 1.0 1.0 na
1970-02-26 6.0 3.0 3.0 na
1980-01-30 na na na 1.0
1980-02-26 na na na 3.0

I tried :

res_df = pd.concat(
    df2[[date, X]].rename(columns={date: "date"}) for date, X in zip(df2.columns[::2], 
    df2.columns[1::2])
    ).pivot_table(index="date")

It works for small data but do not work for mine. Maybe because I have the same columns name 'Unnamed: 1' in my df. I have a message error:

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
  • where is the data sourced from? you have duplicate column names there. It might be possible to get a cleaner solution, if we can control/influence how the input dataframe is created – sammywemmy Aug 17 '22 at 12:12
  • It is from an excel file. I read multiple worksheets with instruction : > df = pd.concat(pd.read_excel(workbook_url, sheet_name=None, skiprows=[0]), axis=1).droplevel(0, axis=1) – Jacques Tebeka Aug 17 '22 at 16:30

1 Answers1

1

Crete index by date varible and use axis=1 in concat:

res_df = (pd.concat((df2[[date, X]].set_index(date) 
                     for date, X in zip(df2.columns[::2],  df2.columns[1::2])), axis=1)
            .rename_axis('date')
            .reset_index())
print (res_df)
         date   X1   X2   X3   X4
0  1970-01-31  5.0  1.0  1.0  NaN
1  1970-02-26  6.0  3.0  3.0  NaN
2  1980-01-30  NaN  NaN  NaN  1.0
3  1980-02-26  NaN  NaN  NaN  3.0

EDIT: Error seems like duplicated columns names in your DataFrame, possible solution is deduplicated before apply solution above:

df = pd.DataFrame(columns=['a','a','b'], index=[0])

#you can test if duplicated columns names
print (df.columns[df.columns.duplicated(keep=False)])
Index(['a', 'a'], dtype='object')

#https://stackoverflow.com/a/43792894/2901002
df.columns = pd.io.parsers.ParserBase({'names':df.columns})._maybe_dedup_names(df.columns)
print (df.columns)
Index(['a', 'a.1', 'b'], dtype='object')
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252