0

I have the following dataframe:

dates = [str(datetime.datetime(2020, 1, 1, 0, 0, 0, 0) + datetime.timedelta(days=i)) for i in range(3)]
repetitions = [3, 6, 4]
dates = [i for i, j in zip(dates, repetitions) for k in range(j)]
cities_ = ['Paris', 'Tokyo', 'Sydney', 'New-York', 'Rio', 'Berlin']
cities = [cities_[0: repetitions[i]] for i in range(len(repetitions))]
cities = [i for j in cities for i in j]
temperatures = [round(random.normalvariate(20, 5), 1) for _ in range(len(cities))]
humidities = [round(random.normalvariate(0.5, 0.4), 1) for _ in range(len(cities))]
humidities = [min(i, 1) for i in humidities]
humidities = [max(i, 0) for i in humidities]
df = pd.DataFrame(data=list(zip(dates, cities, temperatures, humidities)), columns=['date', 'city', 'temperature', 'humidity'])

I need to remove the indexes after applying the pivot function; the code below

values = ['temperature', 'humidity']
df_ = df.pivot(index='date', columns='city', values=values)
Col = list(set(df['city'].values))
for value in values:
  df_.rename(columns={i: value + '_' + i for i in Col}, inplace=True)

outputs:

                            temperature                                                           ...          humidity                                                     
 city                temperature_Berlin temperature_New-York temperature_Paris   temperature_Rio  ... temperature_Paris temperature_Rio temperature_Sydney temperature_Tokyo
 date                                                                                             ...                                                                       
 2020-01-01 00:00:00               NaN                NaN               21.2              NaN  ...               0.3             NaN                1.0               1.0
 2020-01-02 00:00:00               18.4               14.2              19.3            28.7  ...              0.6            0.6                0.1               0.2
 2020-01-03 00:00:00               NaN                31.6              25.9             NaN  ...               0.8             NaN                0.1               0.0

and I need the following result:

                      temperature_Paris  humidity_Paris  temperature_Tokyo  humidity_Tokyo  temperature_Sydney  ...  humidity_New-York  temperature_Rio  humidity_Rio  temperature_Berlin  humidity_Berlin
2020-01-01 00:00:00               21.2             0.3               17.5             1.0                26.3  ...                NaN              NaN           NaN                 NaN              NaN
2020-01-02 00:00:00               19.3             0.6               15.1             0.2                22.8  ...                0.1             28.7           0.6                18.4              0.4
2020-01-03 00:00:00               25.9             0.8               27.5             0.0                29.7  ...                0.6              NaN           NaN                 NaN              NaN

The various solutions offered for questions that look similar, like essentially:

df_ = df_.reset_index().rename_axis([None, None], axis=1)

do not work here.

SeaBean
  • 22,547
  • 3
  • 13
  • 25
Henry
  • 93
  • 1
  • 10
  • 1
    Does this answer your question? [Pandas - How to flatten a hierarchical index in columns](https://stackoverflow.com/questions/14507794/pandas-how-to-flatten-a-hierarchical-index-in-columns), [Pandas: combining header rows of a multiIndex DataFrame](https://stackoverflow.com/q/47637153/15497888) – Henry Ecker Jul 11 '21 at 22:36
  • 1
    This is the most applicable duplicate [flattern pandas dataframe column levels](https://stackoverflow.com/q/49655394/15497888) – Henry Ecker Jul 11 '21 at 22:37
  • 1
    For the `rename_axis` portion it would be `df_ = df_.rename_axis(index=None)` after you've collapsed the Multi-Index columns. Or `df_ = df_.rename_axis(index=None, columns=[None, None])` beforehand. Or super generically to remove all axis names `df_ = df_.rename_axis(index=[None] * df_.index.nlevels, columns=[None] * df_.columns.nlevels)` – Henry Ecker Jul 11 '21 at 22:38
  • The question is a duplicate of the questions mentioned above. However, the accepted answer provides new insights. – Henry Jul 12 '21 at 08:44

2 Answers2

2

Replace:

Col = list(set(df['city'].values))
for value in values:
  df_.rename(columns={i: value + '_' + i for i in Col}, inplace=True)

With:

df_.columns = ['_'.join(i) for i in df_.columns]

Outputs:

                    temperature_Berlin  temperature_New-York ... humidity_Sydney humidity_Tokyo
date                    
2020-01-01 00:00:00 NaN                 NaN                  ... 0.3             0.6
2020-01-02 00:00:00 23.3                26.3                 ... 0.8          0.0
2020-01-03 00:00:00 NaN                 14.6                 ... 0.2          0.6

Edit:

A probably more elegant alternative, as suggested by @Henry Ecker in the comments:

df_.columns = df_.columns.map('_'.join)
dm2
  • 4,053
  • 3
  • 17
  • 28
1

You can use Index.map() with f-string, as follows:

df_.columns = df_.columns.map(lambda x: f'{x[0]}_{x[1]}')

Using this way, you have the freedom to arrange the sequence of the combined words from the MultiIndex as you wish. E.g. if you want to get the city name first then the word 'temperature' (e.g. Berlin_temperature instead), you can just reverse the sequence of x[0] and x[1] in the f-string above.

Result:

print(df_)

                     temperature_Berlin  temperature_New-York  temperature_Paris  temperature_Rio  temperature_Sydney  temperature_Tokyo  humidity_Berlin  humidity_New-York  humidity_Paris  humidity_Rio  humidity_Sydney  humidity_Tokyo
date                                                                                                                                                                                                                                       
2020-01-01 00:00:00                 NaN                   NaN               22.8              NaN                24.7               28.8              NaN                NaN             1.0           NaN              0.9             0.0
2020-01-02 00:00:00                20.2                  21.5               21.6             21.6                 4.3               21.5              0.5                0.5             1.0           0.4              0.4             0.0
2020-01-03 00:00:00                 NaN                  17.3               24.4              NaN                11.3               22.7              NaN                0.4             0.1           NaN              0.0             0.5
SeaBean
  • 22,547
  • 3
  • 13
  • 25