0
root
--- audi    (dirs)
           ---11 01012020.csv  (files)
           ---01 102020.csv
--- bmw
            ---66  10052020.csv
            ---43  11112020.csv
--- mercedes
             ---34  21062020.csv
             ---23  30112020.csv

Above is the structure of my root, dirs and paths. Im trying to get 3 things. car, file_id and date.

  1. Car is the name of the directory
  2. file_id is the first split( aka everything before the 1st whitespace in the filename)
  3. date is everything after the first whitespace in the filename..

This is my code:

for root, dirs, files in os.walk(path_root):
    for file in files:
        if file.endswith('.csv'):
            file_id = file.split()[0]
            date = re.search(' (\d+).', file).group(1)
            car = ? idk how to get this one.

As you can see I managed to get file_id and date. Now I want to get the car name (name of directroy) for every file... what is the easietst way to achieve that?

TangerCity
  • 775
  • 2
  • 7
  • 13

1 Answers1

2

The root variable contains the current directory; you want the last component of that for the car.

for root, dirs, files in os.walk(path_root):
    for file in files:
        if file.endswith('.csv'):
            file_id = file.split()[0]
            date = re.search(' (\d+).', file).group(1)
            car = os.path.basename(root)

If your file names are representative, you don't really need a regex to pull out the date. Maybe file.split()[1].split('.')[0]

tripleee
  • 175,061
  • 34
  • 275
  • 318