0

So basically i'm trying to update the Screening column of the file_change dataframe into the value found in df_duplicated of the same Screening column name.

path_filtered_sample = Path(r"C:\Users\Experiment")
files_for_merging = path_filtered_sample.glob(f'**\*.csv')

df_duplicated = get_duplicated_df(df_filtered_merged) #User defined function that inputs some df to check for some discrepancy--in this case df_filtered_merged
df_duplicated = df_duplicated[['nid', 'Common Screening Value']].rename(columns={"Common Screening Value": 'Screening'})
df_duplicated.set_index('nid', inplace=True)

for file in files_for_merging:
    file_change = pd.read_csv(file)
    file_change.set_index('nid', inplace=True)
    file_change.update(df_duplicated)
    file_change.reset_index(inplace=True)
    file_change.to_csv(file, index=False)

The code works fine until somewhere along the loop where it would fail and give me a ValueError: Shape of passed values is (788, 41), indices imply (787, 41) after hitting the line file_change.update(df_duplicated) which is a little odd because there are only 787 rows in the .csv file excluding the headers. I also tried file_change.shape and it returns (787,41). So I am not sure how the method came decided to say I passed 788 rows since the previous loops had no issues despite having the same format with headers as well. I checked nid column of the problematic file for duplicates and non were found as it was said to cause issues here.

It is a little hard to generate the reproducible output here as I don't know what exactly is causing the issue.

Pherdindy
  • 1,168
  • 7
  • 23
  • 52
  • 1
    Does this answer your question? [Pandas concat: ValueError: Shape of passed values is blah, indices imply blah2](https://stackoverflow.com/questions/27719407/pandas-concat-valueerror-shape-of-passed-values-is-blah-indices-imply-blah2) – MisterMiyagi Nov 08 '21 at 18:45
  • Okay I checked the `nid` column of the "problematic file", but was not able to check if there were duplicates in the `df_duplicated`'s `nid` which I found there was 1 duplicate that caused the issue. Apparently I have to set both dataframes into a multi-index with `nid` being an argument together with another. – Pherdindy Nov 08 '21 at 19:03

0 Answers0