0

df is a minimal code example dataframe which has partially redundant rows.

                 Start                  End   MeterS   MeterE  Value    Type  Amount Unit
0  2021-06-15T09:59:50  2021-06-15T13:01:04  2188950  2196310     24  type_A    1.00   pc
1  2021-06-15T09:59:50  2021-06-15T13:01:04  2188950  2196310     24  type_B    3.02    h
2  2021-06-15T09:59:50  2021-06-15T13:01:04  2188950  2196310     24  type_C    7.36    k
3  2021-06-14T09:57:35  2021-06-14T14:47:35  2181600  2188950     87  type_A    1.00   pc
4  2021-06-14T09:57:35  2021-06-14T14:47:35  2181600  2188950     87  type_B    4.83    h
5  2021-06-14T09:57:35  2021-06-14T14:47:35  2181600  2188950     87  type_C    7.35    k

I want to create a new dataframe based on df, where:

  1. Datetime is unique
  2. columns Type, Amount and Unit will be removed, respectively changed to: Amount_Type_A, Amount_Type_B and Amount_Type_C with the associated data ( example: Amount_Type_B should have 3.02 in row 0, and 4.83 in row 1 )

Code df:

import pandas as pd

columns = ['Start', 'End', 'MeterS', 'MeterE', 'Value', 'Type', 'Amount', 'Unit']
data = {
            'Start': ['2021-06-15T09:59:50', '2021-06-15T09:59:50', '2021-06-15T09:59:50', '2021-06-14T09:57:35', '2021-06-14T09:57:35', '2021-06-14T09:57:35'],
            'End': ['2021-06-15T13:01:04', '2021-06-15T13:01:04', '2021-06-15T13:01:04', '2021-06-14T14:47:35', '2021-06-14T14:47:35', '2021-06-14T14:47:35'],
            'MeterS': [2188950, 2188950, 2188950, 2181600, 2181600, 2181600],
            'MeterE': [2196310, 2196310, 2196310, 2188950, 2188950, 2188950],
            'Value': [24, 24, 24, 87, 87, 87],
            'Type': ['type_A', 'type_B', 'type_C', 'type_A', 'type_B', 'type_C'],
            'Amount': [1, 3.02, 7.36, 1, 4.83, 7.35],
            'Unit': ['pc', 'h', 'k', 'pc', 'h', 'k']
        }

df = pd.DataFrame(data, columns=columns)

I tried this:

columns2 = ['Start', 'End', 'MeterS', 'MeterE', 'Value', 'Amount_Type_A', 'Amount_Type_B', 'Amount_Type_C']
data2 = {
            'Start': [df.Start.drop_duplicates(keep = 'first')],
            'End': [df.End.drop_duplicates(keep = 'first')],
            'MeterS': [df.MeterS.drop_duplicates(keep = 'first')],
            'MeterE': [df.MeterS.drop_duplicates(keep = 'first')],
            'Value': [df.Value.drop_duplicates(keep = 'first')],
            'Amount_Type_A': df.loc[df.Type == 'type_A'].Amount,
            'Amount_Type_B': df.loc[df.Type == 'type_B'].Amount,
            'Amount_Type_C': df.loc[df.Type == 'type_C'].Amount


        }


df2 = pd.DataFrame(data2, columns=columns2)

But i get this error, and I don't understand why:

Traceback (most recent call last):
  File "/Users/stan/Documents/Python/Testing/abstraction_df_from_new_df_grouping.py", line 39, in <module>
    df2 = pd.DataFrame(data2, columns=columns2)
  File "/Users/stan/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 468, in __init__
    mgr = init_dict(data, index, columns, dtype=dtype)
  File "/Users/stan/opt/anaconda3/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 254, in init_dict
    index = extract_index(arrays[~missing])
  File "/Users/stan/opt/anaconda3/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 411, in extract_index
    raise ValueError(msg)
ValueError: array length 1 does not match index length 6

How can I fix this?

stanvooz
  • 522
  • 3
  • 19
  • this look like a `pivot` operation, please refer to this [post](https://stackoverflow.com/questions/47152691/how-can-i-pivot-a-dataframe) – Ben.T Nov 10 '21 at 16:04
  • 1
    Use `df.Start.drop_duplicates(keep = 'first').to_list()` instead of `[df.Start.drop_duplicates(keep = 'first')]` etc. and `df.loc[df.Type == 'type_C'].Amount.to_list()` instead of `df.loc[df.Type == 'type_C'].Amount` etc.? – Timus Nov 10 '21 at 16:40
  • 1
    perfect! It works, thank you! Do you want post this as an answer? – stanvooz Nov 10 '21 at 16:43
  • Thanks, all good, happy to help. – Timus Nov 10 '21 at 18:11

0 Answers0