0

I'm running a routine that opens a directory, and all its sub-directories, performs some tasks, then outputs to a .csv using pandas. However, I need to establish the sub-directory name, so it can be output to the .csv too.

Accessing a single subdirectory, I can do this with:

path = r'/users/directory/sub-directory'
dataframe['sub-directory'] = os.path.basename(path)
print (dataframe)

A B C sub-directory
1 2 3 Folder-1
4 5 6 Folder-1
7 8 9 Folder-1

And the sub-directory is easily assinged with os.path.basename(path). However, I want to run through the directory, which works using Glob, but I lose the sub-directory names when outputting to a .csv:

path = r'/users/directory/*/' #Using Glob
dataframe['sub-directory'] = os.path.basename(path)
print (dataframe)

#Actual Output
A B C sub-directory
1 2 3 NaN
4 5 6 NaN
7 8 9 NaN
1 2 3 NaN
4 5 6 NaN
7 8 9 NaN

#Desired Output
A B C sub-directory
1 2 3 Folder-1
4 5 6 Folder-1
7 8 9 Folder-1
1 2 3 Folder-2
4 5 6 Folder-3
7 8 9 Folder 4

I've seen this answer here: Getting a list of all subdirectories in the current directory, but not sure how to integrate it into my routine.

red_sach
  • 47
  • 1
  • 8

1 Answers1

0

try:

import glob

path = glob.glob(r'/users/directory/*')
dataframe['sub-directory']=[os.path.basename(i) for i in path]
Suhas Mucherla
  • 1,383
  • 1
  • 5
  • 17
  • Nope it throws "ValueError: Length of values does not match length of index" – red_sach Jan 06 '21 at 09:16
  • @SachinReddy Make sure your initialization of the dataframe is correct – Suhas Mucherla Jan 06 '21 at 09:21
  • Not sure this is the right approach. Print ([os.path.basename(i) for i in path]) gives: [ 'u','s','e','r','s','','d','i'... etc]. It doesnt get the individual sub-directory/folder names – red_sach Jan 06 '21 at 09:35
  • @SachinReddy I've edited my answer, check once – Suhas Mucherla Jan 06 '21 at 09:38
  • I'm still confused with the folder structure, please clarify – Suhas Mucherla Jan 06 '21 at 09:39
  • Directory-1 > Sub-1 > Data '1.dat', '2.dat', '3.dat' and Directory-1 > Sub-2 > Data '1.dat', '2.dat', '3.dat' Directory-1 > Sub-3 > Data '1.dat', '2.dat', '3.dat' and so on. What I need is names Sub-1, Sub-2, Sub-3, so I can output them to the .csv Glob.Glob doesn't work either: TypeError: expected str, bytes or os.PathLike object, not list. – red_sach Jan 06 '21 at 09:50
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/226914/discussion-between-sachinreddy-and-suhas-mucherla). – red_sach Jan 06 '21 at 10:28