0

I have this error :

    KeyError: 'id_cont'

    During handling of the above exception, another exception occurred:

    <ipython-input-11-4604edb9a0b7> in generateID(self, outputMode, data_df)
         84 
         85         if outputMode.getModeCB() == CONST_MODE_CONT:
    ---> 86             data_df['id_cont'] = data_df.apply(lambda row:row['product_name']+'-'+row['hour_local'],axis=1)
         87             #data_df['id_cont'] = data_df.apply(lambda row:row['equipement']+'-'+row['product_name']+'-'+row['hour_shift'].strftime('%Y-%m-%d %H:%M:%S'),axis=1)
         88         else:

    /dataiku/dss_data/code-envs/python/Python3_6/lib/python3.6/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
       2936         else:
       2937             # set column
    -> 2938             self._set_item(key, value)
       2939 
       2940     def _setitem_slice(self, key, value):


    ValueError: Wrong number of items passed 149, placement implies 1

Adding this line brings up this error, I think that it's a data type problem :

data_df['id_cont'] = data_df.apply(lambda row:row['product_name']+'-'+row['hour_shift'].strftime('%Y-%m-%d %H:%M:%S'),axis=1)

hour_shift is a datetime and product_name, equipment are object.

Data_ing
  • 87
  • 5
  • 14
  • 1
    Please [edit] your question and add the [**full text** of any errors or tracebacks](https://meta.stackoverflow.com/q/359146). – MattDMo Dec 23 '21 at 13:40
  • I can't put the entire log, here is an exerpt. I don't have any idea of how to solve my problem, if you have an idea I am interested thank you. – Data_ing Dec 23 '21 at 13:48
  • Unfortunately, this offers very little context. Please read these threads to learn how to create a minimal, reproducible example: https://stackoverflow.com/help/minimal-reproducible-example https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – navneethc Dec 23 '21 at 13:53

1 Answers1

0

I think the reason you're getting this error is because the data_df is an empty dataframe due to no rows satisfy the condition data_df['hour_local'].isin(target_hours), causing all hour_shift column values to be NaT, making all rows to be dropped at data_df = data_df.dropna(subset=['hour_shift']). You can test this by using the sample data that has hour_local values that satisfy the condition vs that doesn't

Satisfy condition:

from datetime import datetime
from datetime import timedelta
import time
import pandas as pd

data_df = pd.DataFrame({'local_time': [datetime.strptime("08:30:00",'%H:%M:%S'), datetime.strptime("08:24:00",'%H:%M:%S')], 'product_name': ['A', 'B']})
delta = timedelta(minutes=5)

# Start time
start_time = datetime.strptime("08:20:00",'%H:%M:%S')
cur_time = start_time
target_hours = []
while cur_time.date() <= start_time.date():
    target_hours.append(cur_time.time())
    cur_time += delta

data_df['hour_local'] = pd.to_datetime(data_df["local_time"].astype(str)).dt.time
data_df = data_df.drop(columns=['hour_shift'], errors='ignore')
data_df.loc[data_df['hour_local'].isin(target_hours),'hour_shift'] = data_df['local_time']
data_df = data_df.sort_values(by=['local_time'])
data_df['hour_shift'] = data_df['hour_shift'].ffill()
data_df = data_df.dropna(subset=['hour_shift'])
# This will print dataframe with one row
print(data_df)
data_df['id_cont'] = data_df.apply(lambda row:row['product_name']+'- '+row['hour_shift'].strftime('%Y-%m-%d %H:%M:%S'),axis=1) 
print(data_df)

Not satisfy condition:

from datetime import datetime
from datetime import timedelta
import time
import pandas as pd

# NOTE: no data satisfy the below condition
data_df = pd.DataFrame({'local_time': [datetime.strptime("08:31:00",'%H:%M:%S'), datetime.strptime("08:24:00",'%H:%M:%S')], 'product_name': ['A', 'B']})
delta = timedelta(minutes=5)

# Start time
start_time = datetime.strptime("08:20:00",'%H:%M:%S')
cur_time = start_time
target_hours = []
while cur_time.date() <= start_time.date():
    target_hours.append(cur_time.time())
    cur_time += delta

data_df['hour_local'] = pd.to_datetime(data_df["local_time"].astype(str)).dt.time
data_df = data_df.drop(columns=['hour_shift'], errors='ignore')
data_df.loc[data_df['hour_local'].isin(target_hours),'hour_shift'] = data_df['local_time']
data_df = data_df.sort_values(by=['local_time'])
data_df['hour_shift'] = data_df['hour_shift'].ffill()
data_df = data_df.dropna(subset=['hour_shift'])
# This will print empty dataframe
print(data_df)
data_df['id_cont'] = data_df.apply(lambda row:row['product_name']+'- '+row['hour_shift'].strftime('%Y-%m-%d %H:%M:%S'),axis=1) 

One way I think you can avoid this error is the add a check to only run the apply line if the dataframe is not empty

if len(data_df):
    data_df['id_cont'] = data_df.apply(lambda row:row['product_name']+'- '+row['hour_shift'].strftime('%Y-%m-%d %H:%M:%S'),axis=1) 
    print(data_df)
tax evader
  • 2,082
  • 1
  • 7
  • 9