I am looking for a better way to write this code, possibly similar to the method used in context manager based decorator syntax for a code block
Currently, for each new data frame or data frame view created, the shape is logged to track any logical errors resulting in missing data. It would be useful for any case where I am doing automated processing on data to identify where data disappears in the script if it does.
def process_data(frame):
shape = {}
shape['original'] = frame.shape
errors = frame[frame['SHIFT'].str.len() >2]
shape['errors'] = errors.shape
ok = frame[frame['SHIFT'].str.len() <3]
shape['ok'] = ok.shape
merge_list = [v for v in (errors,ok) if v is not None]
healed = pd.concat(merge_list)
shape['healed'] = healed.shape
if shape['healed'][0] != shape['original'][0] or shape['healed'][1] != shape['original'][1]:
raise ValueError(f"Some data loss \n{shape}")
return healed
I would prefer to run a process with syntax similar to this.
def process_data(frame):
with shape_info:
frame = frame
errors = frame[frame['SHIFT'].str.len() >2]
ok = frame[frame['SHIFT'].str.len() <3]
merge_list = [v for v in (errors,ok) if v is not None]
healed = pd.concat(merge_list)
if shape_info.first()!=shape_info.last():
raise ValueError(f"Some data loss \n{shape_info}")
return healed
Is the context manager a good way to track this?