df
is a minimal code example dataframe which has partially redundant rows.
Start End MeterS MeterE Value Type Amount Unit
0 2021-06-15T09:59:50 2021-06-15T13:01:04 2188950 2196310 24 type_A 1.00 pc
1 2021-06-15T09:59:50 2021-06-15T13:01:04 2188950 2196310 24 type_B 3.02 h
2 2021-06-15T09:59:50 2021-06-15T13:01:04 2188950 2196310 24 type_C 7.36 k
3 2021-06-14T09:57:35 2021-06-14T14:47:35 2181600 2188950 87 type_A 1.00 pc
4 2021-06-14T09:57:35 2021-06-14T14:47:35 2181600 2188950 87 type_B 4.83 h
5 2021-06-14T09:57:35 2021-06-14T14:47:35 2181600 2188950 87 type_C 7.35 k
I want to create a new dataframe based on df
, where:
- Datetime is unique
- columns
Type
,Amount
andUnit
will be removed, respectively changed to:Amount_Type_A
,Amount_Type_B
andAmount_Type_C
with the associated data ( example:Amount_Type_B
should have3.02
in row 0, and4.83
in row 1 )
Code df
:
import pandas as pd
columns = ['Start', 'End', 'MeterS', 'MeterE', 'Value', 'Type', 'Amount', 'Unit']
data = {
'Start': ['2021-06-15T09:59:50', '2021-06-15T09:59:50', '2021-06-15T09:59:50', '2021-06-14T09:57:35', '2021-06-14T09:57:35', '2021-06-14T09:57:35'],
'End': ['2021-06-15T13:01:04', '2021-06-15T13:01:04', '2021-06-15T13:01:04', '2021-06-14T14:47:35', '2021-06-14T14:47:35', '2021-06-14T14:47:35'],
'MeterS': [2188950, 2188950, 2188950, 2181600, 2181600, 2181600],
'MeterE': [2196310, 2196310, 2196310, 2188950, 2188950, 2188950],
'Value': [24, 24, 24, 87, 87, 87],
'Type': ['type_A', 'type_B', 'type_C', 'type_A', 'type_B', 'type_C'],
'Amount': [1, 3.02, 7.36, 1, 4.83, 7.35],
'Unit': ['pc', 'h', 'k', 'pc', 'h', 'k']
}
df = pd.DataFrame(data, columns=columns)
I tried this:
columns2 = ['Start', 'End', 'MeterS', 'MeterE', 'Value', 'Amount_Type_A', 'Amount_Type_B', 'Amount_Type_C']
data2 = {
'Start': [df.Start.drop_duplicates(keep = 'first')],
'End': [df.End.drop_duplicates(keep = 'first')],
'MeterS': [df.MeterS.drop_duplicates(keep = 'first')],
'MeterE': [df.MeterS.drop_duplicates(keep = 'first')],
'Value': [df.Value.drop_duplicates(keep = 'first')],
'Amount_Type_A': df.loc[df.Type == 'type_A'].Amount,
'Amount_Type_B': df.loc[df.Type == 'type_B'].Amount,
'Amount_Type_C': df.loc[df.Type == 'type_C'].Amount
}
df2 = pd.DataFrame(data2, columns=columns2)
But i get this error, and I don't understand why:
Traceback (most recent call last):
File "/Users/stan/Documents/Python/Testing/abstraction_df_from_new_df_grouping.py", line 39, in <module>
df2 = pd.DataFrame(data2, columns=columns2)
File "/Users/stan/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 468, in __init__
mgr = init_dict(data, index, columns, dtype=dtype)
File "/Users/stan/opt/anaconda3/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 254, in init_dict
index = extract_index(arrays[~missing])
File "/Users/stan/opt/anaconda3/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 411, in extract_index
raise ValueError(msg)
ValueError: array length 1 does not match index length 6
How can I fix this?