0

I am trying to create a loop in python which will allow me to open a folder, iterate through the subfolders within it, read the json files and output them as a csv. Then repeat the loop for each subfolder.

My directory looks like this:

Main folder = "Exports"

Subfolder = "Folder1" , "Folder2" etc..

Files within subfolder = "file1.json" , "file2.json" etc...

Currently I am running the following code within a subfolder (for example "Folder1") to create an output file:

import pandas as pd
import os
path = os.getcwd()
frame = pd.DataFrame()
for filename in os.listdir(os.getcwd()):
    root, ext = os.path.splitext(filename)
    if ext == '.json':
        tmp_frame = pd.read_json(filename)
        frame = frame.append(tmp_frame, ignore_index=True)
        
frame.to_csv(os.path.join(path + ".csv"))

My question is how do I run that loop but within the main folder where it will open each subfolder, then run that loop and output the file as csv for each subfolder.

Thanks

  • Does this answer your question? [Getting a list of all subdirectories in the current directory](https://stackoverflow.com/questions/973473/getting-a-list-of-all-subdirectories-in-the-current-directory) – VirxEC Jul 14 '20 at 16:58

1 Answers1

0

Lets try pathlib and defaultdict from the standard lib

we can build a dictionary of subfolders as keys, and all the files as values within a list.

from pathlib import Path
from collections import defaultdict

your_path = 'target_directory'

file_dict = defaultdict(list)

for each_file in Path(p).rglob('*.csv'): # change this to `.json`
    file_dict[each_file.parent].append(each_file)


print(file_dict)

your dictionary will be a list of Pathlib objects that will vaguely resemble this, the key is the sub folder (I've just printed the name here)

{Notebooks : [test.csv,
             test_file.csv,
             test_file_edited.csv] ,
test_csv : [File20200610.csv,
           File20201012 - Copy.csv,
           File20201012.csv] }

then we can just loop over the dictionary and save each object to your target folder.

for each_sub_folder,files in file_dict.items():
    dfs = []
    for each_file in files:
        
        j = pd.read_json(each_file) #your read method.
        dfs.append(j) # append to list.
     df = pd.concat(dfs)
     df.to_csv(Path(target_path).joinpath(each_sub_folder.name + '.csv'),index=False)
Umar.H
  • 22,559
  • 7
  • 39
  • 74
  • Hey thanks - I managed to create the dictionary which has the list of Pathlib objects which show up like this: defaultdict(, {WindowsPath('C:/Users/ - My question is for the next part of your code - loop over the dictionary ( I am getting the following error) - ValueError: Invalid file path or buffer object type: – Haris Jawed Jul 14 '20 at 20:19
  • That worked - however it is using only the last JSON file in that folder instead of using all the JSON files in the subfolder and appending them to a csv file per subfolder. Any advice there? – Haris Jawed Jul 14 '20 at 20:35
  • @HarisJawed yep, you just need a list see edit, this is pretty basic stuff (looping and lists) you should be able to get to your answer from here ;) – Umar.H Jul 14 '20 at 20:38