Create a loop to open subfolders within a folder read the json files and output as csv

Question

I am trying to create a loop in python which will allow me to open a folder, iterate through the subfolders within it, read the json files and output them as a csv. Then repeat the loop for each subfolder.

My directory looks like this:

Main folder = "Exports"

Subfolder = "Folder1" , "Folder2" etc..

Files within subfolder = "file1.json" , "file2.json" etc...

Currently I am running the following code within a subfolder (for example "Folder1") to create an output file:

import pandas as pd
import os
path = os.getcwd()
frame = pd.DataFrame()
for filename in os.listdir(os.getcwd()):
    root, ext = os.path.splitext(filename)
    if ext == '.json':
        tmp_frame = pd.read_json(filename)
        frame = frame.append(tmp_frame, ignore_index=True)
        
frame.to_csv(os.path.join(path + ".csv"))

My question is how do I run that loop but within the main folder where it will open each subfolder, then run that loop and output the file as csv for each subfolder.

Thanks

Does this answer your question? [Getting a list of all subdirectories in the current directory](https://stackoverflow.com/questions/973473/getting-a-list-of-all-subdirectories-in-the-current-directory) — VirxEC, Jul 14 '20 at 16:58

Umar.H · Accepted Answer · 2020-07-14T20:38:11.613

0

Lets try pathlib and defaultdict from the standard lib

we can build a dictionary of subfolders as keys, and all the files as values within a list.

from pathlib import Path
from collections import defaultdict

your_path = 'target_directory'

file_dict = defaultdict(list)

for each_file in Path(p).rglob('*.csv'): # change this to `.json`
    file_dict[each_file.parent].append(each_file)


print(file_dict)

your dictionary will be a list of Pathlib objects that will vaguely resemble this, the key is the sub folder (I've just printed the name here)

{Notebooks : [test.csv,
             test_file.csv,
             test_file_edited.csv] ,
test_csv : [File20200610.csv,
           File20201012 - Copy.csv,
           File20201012.csv] }

then we can just loop over the dictionary and save each object to your target folder.

for each_sub_folder,files in file_dict.items():
    dfs = []
    for each_file in files:
        
        j = pd.read_json(each_file) #your read method.
        dfs.append(j) # append to list.
     df = pd.concat(dfs)
     df.to_csv(Path(target_path).joinpath(each_sub_folder.name + '.csv'),index=False)

edited Jul 14 '20 at 20:38

answered Jul 14 '20 at 17:13

Umar.H

22,559
7
39
74

Hey thanks - I managed to create the dictionary which has the list of Pathlib objects which show up like this: defaultdict(, {WindowsPath('C:/Users/ - My question is for the next part of your code - loop over the dictionary ( I am getting the following error) - ValueError: Invalid file path or buffer object type: – Haris Jawed Jul 14 '20 at 20:19
That worked - however it is using only the last JSON file in that folder instead of using all the JSON files in the subfolder and appending them to a csv file per subfolder. Any advice there? – Haris Jawed Jul 14 '20 at 20:35
@HarisJawed yep, you just need a list see edit, this is pretty basic stuff (looping and lists) you should be able to get to your answer from here ;) – Umar.H Jul 14 '20 at 20:38

Create a loop to open subfolders within a folder read the json files and output as csv

1 Answers1