1

I'm trying to use modin and ray() but I can't move file after read it. In line shutil.move(f"./IMPORT/"+file,f"./IMPORTED/"+file)

file is still open, there is some way to close it and move it in other folder?

Here is entire code:

    import os
    from pathlib import Path
    import shutil
    import ray
    import ray.util
    ray.init()
    import modin.pandas as pd
    
    current_directory = os.getcwd()
    import_folder_path = os.path.join(current_directory, 'IMPORT')
    folder_path: Path = Path(import_folder_path)
    file_list = []
    
    file_list = list(
        filter(lambda x: x if x.endswith('.xlsx') else None,
        os.listdir(folder_path))
    )
    df2 = []
    if len(file_list):
        excl_list=[]
        excl_merged = pd.DataFrame()
        imported_file_path = os.path.join(current_directory, 'IMPORTED\\')
        for file in file_list:
            file_path = os.path.join(folder_path,file)
            df=pd.read_excel(file_path)
            df = df[df['Delivery Status'] != 'Delivered']
            df2 = df.append(df)
            shutil.move(f"./IMPORT/"+file,f"./IMPORTED/"+file)
    
        output_file_path = os.path.join(folder_path,'output.xlsx')
        df2.to_excel(output_file_path, index=False)
    else:
        print("No excel file found")

Thank you for your help

Tim Roberts
  • 48,973
  • 4
  • 21
  • 30
  • Side note, you have `df2 = df.append(df)` where you meant `df2 = df2.append(df)`. Please show us the traceback you get. – Tim Roberts Feb 16 '23 at 07:35
  • What do you mean by "file is still open" – Itération 122442 Feb 16 '23 at 07:36
  • Hi @TimRoberts here is traceback: 'Si è verificata un'eccezione: PermissionError [WinError 32] Impossibile accedere al file. Il file è utilizzato da un altro processo: './IMPORT/abc (1).xlsx' During handling of the above exception, another exception occurred: File "C:\Users\angelo\main.py", line 28, in shutil.move(f"./IMPORT/"+file,f"./IMPORTED/"+file) – Angelo Malfitano Feb 16 '23 at 09:35

1 Answers1

0

There is a mention of this problem in https://github.com/pandas-dev/pandas/issues/29803. The suggested workaround is to manage the file handle lifetime yourself:

...
        for file in file_list:
            file_path = os.path.join(folder_path,file)
            with open(file_path,"rb") as xlfile:
                df=pd.read_excel(xlfile)

Pandas can read from a file handle, and this way the with ensures the handle is closed.

Tim Roberts
  • 48,973
  • 4
  • 21
  • 30
  • Hi Tim, I change the code as you suggested but I found a new issue error is here: 'Si è verificata un'eccezione: TypeError Could not serialize the argument <_io.BufferedReader name='C:\\IMPORT\\abc (1).xlsx'> for a task or actor modin.core.execution.ray.common.engine_wrapper._deploy_ray_func. Check https://docs.ray.io/en/master/ray-core/objects/serialization.html#troubleshooting for more information. The above exception was the direct cause of the following exception: File "C:\main.py", line 36, in df=pd.read_excel(xlfile) – Angelo Malfitano Feb 18 '23 at 06:16
  • Does it work if you import `pandas` instead of `modin.pandas`? – Tim Roberts Feb 18 '23 at 06:44
  • Hi Tim, yes it works without any issue. Using modin, file is still open – Angelo Malfitano Feb 20 '23 at 10:46
  • Then that's a bug in `modin`. Perhaps you should file a bug report with that project. – Tim Roberts Feb 21 '23 at 00:35
  • Hi Tim, yes I forwarded issue to develper. I'll back to you soon, thank you for now. – Angelo Malfitano Feb 21 '23 at 10:35