1

I wrote a dataframe to a csv in Pyspark. And I got the output files in the directory as:

._SUCCESS.crc

.part-00000-6cbfdcfd-afff-4ded-802c-6ccd67f3804a-c000.csv.crc

part-00000-6cbfdcfd-afff-4ded-802c-6ccd67f3804a-c000.csv

How do I keep only the CSV file in the directory and delete rest of the files, using Python?

linux
  • 157
  • 11

5 Answers5

2
import os
directory = "/path/to/directory/with/files"
files_in_directory = os.listdir(directory)
filtered_files = [file for file in files_in_directory if not file.endswith(".csv")]
for file in filtered_files:
    path_to_file = os.path.join(directory, file)
    os.remove(path_to_file)

first, you list all files in directory. Then, you only keep in list those, which don't end with .csv. And then, you remove all files that are left.

Blomex
  • 305
  • 2
  • 12
0

Try iterating over the files in the directory, and then os.remove only those files that do not end with .csv.

import os
dir_path = "path/to/the/directory/containing/files"
dir_list = os.listdir(dir_path)
for item in dir_list:
    if not item.endswith(".csv"):
        os.remove(os.path.join(dir_path, item))
theoctober19th
  • 364
  • 3
  • 13
0

You can also have fun with list comprehension for doing this:

import os

dir_path = 'output/'

[os.remove(os.path.join(dir_path, item)) for item in os.listdir(dir_path) if not item.endswith('.csv')]
Synthase
  • 5,849
  • 2
  • 12
  • 34
0

I would recommended to use pathlib (Python >= 3.4) and the in-build type set() to substract all csv filenames from the list of all files. I would argument this is easy to read, fast to process and a good pythonic solution.

>>> from pathlib import Path
>>> p = Path('/path/to/directory/with/files')
>>> # Get all file names
>>> # https://stackoverflow.com/a/65025567/4865723
>>> set_all_files = set(filter(Path.is_file, p.glob('**/*')))
>>> # Get all csv filenames (BUT ONLY with lower case suffix!)
>>> set_csv_files = set(filter(Path.is_file, p.glob('**/*.csv')))
>>> # Create a file list without csv files
>>> set_files_to_delete = set_all_files - set_csv_files
>>> # Iteratore on that list and delete the file
>>> for file_name in set_files_to_delete:
...     Path(file_name).unlink()
buhtz
  • 10,774
  • 18
  • 76
  • 149
0
for (root,dirs,files) in os.walk('Test', topdown=true):
    for name in files:
        fp = os.path.join(root, name)
        if name.endswith(".csv"):
            pass
        else:
             os.remove(fp)

What the advandtage of os.walk?, it reads all the subdirectory in particular directory mentioned.

Faraaz Kurawle
  • 1,085
  • 6
  • 24