How to generate array of key:values from merging columns

Question

In need to create a new column which will merge all month columns and display their values as an array of key-values pairs.

I was already able to merge the columns:

df["Consumo"] = df.iloc[:,[4,5,6,7,8,9,10,11,12,13,14,15]].values.tolist()

I'm just not sure how I'm supposed to map through the values first in order to arrange the key-values array.

Sure, I could iterate over each column and then iterate over each row, but it would take take me somehow to keep the values and then push into an array of arrays... I thought about something like that, I could keep this huge array of arrays and then push it into the df["Consumo"], would that work?

Input Data:

Cód. Perfil Sigla Nome Empresarial ... 2020-10-01 00:00:00 2020-11-01 00:00:00 2020-12-01 00:00:00 ...
3    RGE SUL RGE SUL DISTRIBUIDORA DE ENERGIA S.A. ... 230357.542859 284340.749249 337758.293447 ...

Expected Output:

Cód. Perfil Sigla Nome Empresarial Patamar Consumo
3    RGE SUL RGE SUL DISTRIBUIDORA DE ENERGIA S.A.LEVE  [2020-10-01 00:00:00: 375637.681828, 2020-10-01 00:00:00: 350523.989792, 2020-10-01 00:00:00: 314549.742607, ...

tdy · Accepted Answer · 2021-03-14T18:57:35.857

You can melt, groupby, and apply a dict mapping.

Using a simplified example:

import pandas as pd
df = pd.DataFrame({
    'Perfil': ['SUL', 'ABC'], 'Sigla': ['RGE', 'XYZ'], 'Name': ['SUL', 'ABC'],
    '2020-10-01 00:00:00': [230357.542859, 372356.997901],
    '2020-11-01 00:00:00': [284340.749249, 455408.113981],
    '2020-12-01 00:00:00': [337758.293447, 698728.068559],
})

	Perfil	Sigla	Name	2020-10-01 00:00:00	2020-11-01 00:00:00	2020-12-01 00:00:00
0	SUL	RGE	SUL	230357.542859	284340.749249	337758.293447
1	ABC	XYZ	ABC	372356.997901	455408.113981	698728.068559

First melt by the non-month columns:

non_months = df.columns[0:3] # change as needed for the real data
df = df.melt(id_vars=non_months)

	Perfil	Sigla	Name	variable	value
0	SUL	RGE	SUL	2020-10-01 00:00:00	230357.542859
1	ABC	XYZ	ABC	2020-10-01 00:00:00	372356.997901
2	SUL	RGE	SUL	2020-11-01 00:00:00	284340.749249
3	ABC	XYZ	ABC	2020-11-01 00:00:00	455408.113981
4	SUL	RGE	SUL	2020-12-01 00:00:00	337758.293447
5	ABC	XYZ	ABC	2020-12-01 00:00:00	698728.068559

Then groupby the non-month columns and apply a dict mapping:

df = (df.groupby(list(non_months))
        .apply(lambda x: dict(zip(x.variable, x.value)))
        .to_frame(name='Consumo'))

Perfil	Sigla	Name	Consumo
ABC	XYZ	ABC	{'2020-10-01 00:00:00': 372356.997901, '2020-1...
SUL	RGE	SUL	{'2020-10-01 00:00:00': 230357.542859, '2020-1...

Note: If you want slightly better performance, you can replace the dict-zip lambda with Series.to_dict:

lambda x: pd.Series(x.value.values, index=x.variable).to_dict()

score 1 · Answer 2 · answered Mar 14 '21 at 20:01

Try:

df["Consumo"] = df.iloc[:,[4,5,6,7,8,9,10,11,12,13,14,15]].to_dict("records")
df["Consumo"] = df["Consumo"].map(str).str[1:-1]

to_dict is the key here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html (you just need to choose proper orientation of output dict).

Then second line is just to drop sorrounding dict brackets.

score 0 · Answer 3 · answered Mar 14 '21 at 15:46

Maybe there is a more elegant way of doing it, but well, this one worked for me:

for col in df.iloc[:,[4,5,6,7,8,9,10,11,12,13,14,15]]:
    i = df.columns.get_loc(col)
    for j, row_value in df[col].iteritems():
        if i == 4:
            series.append([{col:row_value}])
        else:
            series[j].append({col:row_value})

df["Consumo"] = series

How to generate array of key:values from merging columns

3 Answers3