1

In need to create a new column which will merge all month columns and display their values as an array of key-values pairs.

I was already able to merge the columns:

df["Consumo"] = df.iloc[:,[4,5,6,7,8,9,10,11,12,13,14,15]].values.tolist()

I'm just not sure how I'm supposed to map through the values first in order to arrange the key-values array.

Sure, I could iterate over each column and then iterate over each row, but it would take take me somehow to keep the values and then push into an array of arrays... I thought about something like that, I could keep this huge array of arrays and then push it into the df["Consumo"], would that work?

Input Data:

Cód. Perfil Sigla Nome Empresarial ... 2020-10-01 00:00:00 2020-11-01 00:00:00 2020-12-01 00:00:00 ...
3    RGE SUL RGE SUL DISTRIBUIDORA DE ENERGIA S.A. ... 230357.542859 284340.749249 337758.293447 ...

Expected Output:

Cód. Perfil Sigla Nome Empresarial Patamar Consumo
3    RGE SUL RGE SUL DISTRIBUIDORA DE ENERGIA S.A.LEVE  [2020-10-01 00:00:00: 375637.681828, 2020-10-01 00:00:00: 350523.989792, 2020-10-01 00:00:00: 314549.742607, ...

3 Answers3

1

You can melt, groupby, and apply a dict mapping.

Using a simplified example:

import pandas as pd
df = pd.DataFrame({
    'Perfil': ['SUL', 'ABC'], 'Sigla': ['RGE', 'XYZ'], 'Name': ['SUL', 'ABC'],
    '2020-10-01 00:00:00': [230357.542859, 372356.997901],
    '2020-11-01 00:00:00': [284340.749249, 455408.113981],
    '2020-12-01 00:00:00': [337758.293447, 698728.068559],
})
Perfil Sigla Name 2020-10-01 00:00:00 2020-11-01 00:00:00 2020-12-01 00:00:00
0 SUL RGE SUL 230357.542859 284340.749249 337758.293447
1 ABC XYZ ABC 372356.997901 455408.113981 698728.068559

First melt by the non-month columns:

non_months = df.columns[0:3] # change as needed for the real data
df = df.melt(id_vars=non_months)
Perfil Sigla Name variable value
0 SUL RGE SUL 2020-10-01 00:00:00 230357.542859
1 ABC XYZ ABC 2020-10-01 00:00:00 372356.997901
2 SUL RGE SUL 2020-11-01 00:00:00 284340.749249
3 ABC XYZ ABC 2020-11-01 00:00:00 455408.113981
4 SUL RGE SUL 2020-12-01 00:00:00 337758.293447
5 ABC XYZ ABC 2020-12-01 00:00:00 698728.068559

Then groupby the non-month columns and apply a dict mapping:

df = (df.groupby(list(non_months))
        .apply(lambda x: dict(zip(x.variable, x.value)))
        .to_frame(name='Consumo'))
Perfil Sigla Name Consumo
ABC XYZ ABC {'2020-10-01 00:00:00': 372356.997901, '2020-1...
SUL RGE SUL {'2020-10-01 00:00:00': 230357.542859, '2020-1...

Note: If you want slightly better performance, you can replace the dict-zip lambda with Series.to_dict:

lambda x: pd.Series(x.value.values, index=x.variable).to_dict()
tdy
  • 36,675
  • 19
  • 86
  • 83
1

Try:

df["Consumo"] = df.iloc[:,[4,5,6,7,8,9,10,11,12,13,14,15]].to_dict("records")
df["Consumo"] = df["Consumo"].map(str).str[1:-1]

to_dict is the key here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html (you just need to choose proper orientation of output dict).

Then second line is just to drop sorrounding dict brackets.

Grzegorz Skibinski
  • 12,624
  • 2
  • 11
  • 34
0

Maybe there is a more elegant way of doing it, but well, this one worked for me:

for col in df.iloc[:,[4,5,6,7,8,9,10,11,12,13,14,15]]:
    i = df.columns.get_loc(col)
    for j, row_value in df[col].iteritems():
        if i == 4:
            series.append([{col:row_value}])
        else:
            series[j].append({col:row_value})

df["Consumo"] = series