So basically i'm trying to update the Screening
column of the file_change
dataframe into the value found in df_duplicated
of the same Screening
column name.
path_filtered_sample = Path(r"C:\Users\Experiment")
files_for_merging = path_filtered_sample.glob(f'**\*.csv')
df_duplicated = get_duplicated_df(df_filtered_merged) #User defined function that inputs some df to check for some discrepancy--in this case df_filtered_merged
df_duplicated = df_duplicated[['nid', 'Common Screening Value']].rename(columns={"Common Screening Value": 'Screening'})
df_duplicated.set_index('nid', inplace=True)
for file in files_for_merging:
file_change = pd.read_csv(file)
file_change.set_index('nid', inplace=True)
file_change.update(df_duplicated)
file_change.reset_index(inplace=True)
file_change.to_csv(file, index=False)
The code works fine until somewhere along the loop where it would fail and give me a ValueError: Shape of passed values is (788, 41), indices imply (787, 41)
after hitting the line file_change.update(df_duplicated)
which is a little odd because there are only 787 rows in the .csv
file excluding the headers. I also tried file_change.shape
and it returns (787,41)
. So I am not sure how the method came decided to say I passed 788 rows since the previous loops had no issues despite having the same format with headers as well. I checked nid
column of the problematic file for duplicates and non were found as it was said to cause issues here.
It is a little hard to generate the reproducible output here as I don't know what exactly is causing the issue.