How to delete a file in python after using pandas.read_csv

Question

I have a code that reads a certain csv file and splits it into parts with 500000 rows each. After spliting, it deletes de original file. The problem is that sometimes, this file is too big and I run out of disk space before the splits finishes, so I needed to delete the file after reading and before spliting it. Here's the full code:

import pandas as pd
import csv
import os

def csv_splitter(): 
    chunk_size = 500000
    batch_no = 1
    file_count = 0
    file_location = r'C:\Users\Documents'
    valid_files = []
    file_name = 'file_output'
    for file in os.listdir(file_location):
        if file_name in file: 
            if file.partition(file_name)[0] == "":
                valid_files.append(file)
    
    if len(valid_files) == 0:
        print('File not found')
        return()
        
    archive = str(file_location) + '\\' + str(valid_files[0])

    for chunk in pd.read_csv(archive, chunksize = chunk_size, encoding ='latin1', delimiter = '|', dtype = 'str'):
        chunk.to_csv(file_name + '_split_' + str(batch_no) + '.csv',  quoting = csv.QUOTE_ALL, sep = '|', index = False)
        batch_no += 1
    
    if os.path.exists(archive):
        os.remove(archive)
    
    for path in os.listdir(file_location):
        if os.path.isfile(os.path.join(file_location, path)):
            file_count += 1
    print('The file was split in ' + str(file_count) + ' parts')

    return()

I tried to put the read_csv in a variable and after that delete the archive, but it returns an error saying the file is in use by another program. It ended like this:

test = pd.read_csv(archive, chunksize = chunk_size, encoding ='latin1', delimiter = '|', dtype = 'str')
    
    if os.path.exists(archive):
        os.remove(archive)
   
    for chunk in test:
        chunk.to_csv(archive[0:-4] + '_split_' + str(batch_no) + '.csv',  quoting = csv.QUOTE_ALL, sep = '|', index = False)
        batch_no += 1

Can someone please help me?

Does this answer your question? [How can I delete a file or folder in Python?](https://stackoverflow.com/questions/6996603/how-can-i-delete-a-file-or-folder-in-python) — JRiggles, May 26 '23 at 17:29
"an error saying the file is in use by another program" -> so stop using the file in the other program, first? — Kache, May 26 '23 at 17:55
You might be running out of memory because you didn't set `iterator=True` (if you really want to use chunksize). See https://stackoverflow.com/a/12193309/12033271 — ProblemsLoop, May 26 '23 at 18:05
@ProblemsLoop Im literally running out of disk space. Sometimes the file is too big and I don't have enough HD space for the original files plus the splits before the original file is deleted — BufsXD, May 26 '23 at 19:22

score 0 · Answer 1 · answered May 26 '23 at 18:03

0

There is a numpy function that splits your csv file into batches for you.

import numpy as np

splitBatches = np.array_split(file_name, numBatches)

answered May 26 '23 at 18:03

Jaden Taylor

3
5

score 0 · Answer 2 · answered May 26 '23 at 18:06

0

I stand to be corrected on this one please...

When you you use read_csv the file will be kept open until pandas is done with it, so it might help to open the file and close it.

f = open(archive, "w")
...
f.close()
os.remove(archive)

All the best...

answered May 26 '23 at 18:06

Nkosikhona Carlos

117
7

I don't know exactly how to pass the data do pandas after I use the open function, can you help? – BufsXD May 26 '23 at 19:23

How to delete a file in python after using pandas.read_csv

2 Answers2