Python: Read several json files from a folder

Question

I would like to know how to read several json files from a single folder (without specifying the files names, just that they are json files).

Also, it is possible to turn them into a pandas DataFrame?

Can you give me a basic example?

Scott · Accepted Answer · 2017-08-15T16:17:37.570

One option is listing all files in a directory with os.listdir and then finding only those that end in '.json':

import os, json
import pandas as pd

path_to_json = 'somedir/'
json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')]
print(json_files)  # for me this prints ['foo.json']

Now you can use pandas DataFrame.from_dict to read in the json (a python dictionary at this point) to a pandas dataframe:

montreal_json = pd.DataFrame.from_dict(many_jsons[0])
print montreal_json['features'][0]['geometry']

Prints:

{u'type': u'Point', u'coordinates': [-73.6051013, 45.5115944]}

In this case I had appended some jsons to a list many_jsons. The first json in my list is actually a geojson with some geo data on Montreal. I'm familiar with the content already so I print out the 'geometry' which gives me the lon/lat of Montreal.

The following code sums up everything above:

import os, json
import pandas as pd

# this finds our json files
path_to_json = 'json/'
json_files = [pos_json for pos_json in os.listdir(path_to_json) if pos_json.endswith('.json')]

# here I define my pandas Dataframe with the columns I want to get from the json
jsons_data = pd.DataFrame(columns=['country', 'city', 'long/lat'])

# we need both the json and an index number so use enumerate()
for index, js in enumerate(json_files):
    with open(os.path.join(path_to_json, js)) as json_file:
        json_text = json.load(json_file)

        # here you need to know the layout of your json and each json has to have
        # the same structure (obviously not the structure I have here)
        country = json_text['features'][0]['properties']['country']
        city = json_text['features'][0]['properties']['name']
        lonlat = json_text['features'][0]['geometry']['coordinates']
        # here I push a list of data into a pandas DataFrame at row given by 'index'
        jsons_data.loc[index] = [country, city, lonlat]

# now that we have the pertinent json data in our DataFrame let's look at it
print(jsons_data)

for me this prints:

  country           city                   long/lat
0  Canada  Montreal city  [-73.6051013, 45.5115944]
1  Canada        Toronto  [-79.3849008, 43.6529206]

It may be helpful to know that for this code I had two geojsons in a directory name 'json'. Each json had the following structure:

{"features":
[{"properties":
{"osm_key":"boundary","extent":
[-73.9729016,45.7047897,-73.4734865,45.4100756],
"name":"Montreal city","state":"Quebec","osm_id":1634158,
"osm_type":"R","osm_value":"administrative","country":"Canada"},
"type":"Feature","geometry":
{"type":"Point","coordinates":
[-73.6051013,45.5115944]}}],
"type":"FeatureCollection"}

Really helpful. Instead of print my idea was to save all of them into one panda data frame, should what would be the correct code? create an empty data frame and begin adding rows to it? Thanks @Scott for this detail answer! — donpresente, May 30 '15 at 08:43
@donpresente Good question. I'll post an edit to address how to get some desired data from a json and then push this data into a pandas DataFrame, row by row. — Scott, May 30 '15 at 16:41

score 24 · Answer 2 · answered May 29 '15 at 22:01

24

Iterating a (flat) directory is easy with the glob module

from glob import glob

for f_name in glob('foo/*.json'):
    ...

As for reading JSON directly into pandas, see here.

answered May 29 '15 at 22:01

Ami Tavory

74,578
11
141
185

1

The link is broken – Julien Aug 05 '22 at 13:49

score 11 · Answer 3 · answered Mar 23 '20 at 15:03

Loads all files that end with * .json from a specific directory into a dict:

import os,json

path_to_json = '/lala/'

for file_name in [file for file in os.listdir(path_to_json) if file.endswith('.json')]:
  with open(path_to_json + file_name) as json_file:
    data = json.load(json_file)
    print(data)

Try it yourself: https://repl.it/@SmaMa/loadjsonfilesfromfolderintodict

score 6 · Answer 4 · edited Apr 02 '20 at 20:37

6

To read the json files,

import os
import glob

contents = []
json_dir_name = '/path/to/json/dir'

json_pattern = os.path.join(json_dir_name, '*.json')
file_list = glob.glob(json_pattern)
for file in file_list:
  contents.append(read(file))

edited Apr 02 '20 at 20:37

Max Naude

328
5
9

answered May 29 '15 at 22:01

Saravana Kumar

394
1
3
13

The contents.append is creating a dictionary adding all readed json files into it? Thanks @Saravana! – donpresente May 30 '15 at 08:42
1

`contents.append` add one element to the list `contents`. – Saravana Kumar May 30 '15 at 08:58
There should be a comma after "*.json' ) " – Ted Taylor of Life Dec 31 '16 at 02:46
I get the error message `NameError: name 'read' is not defined` – Julien Aug 05 '22 at 23:03

score 3 · Answer 5 · answered Feb 18 '22 at 09:56

3

I am using glob with pandas. Checkout the below code

import pandas as pd
from glob import glob

df = pd.concat([pd.read_json(f_name, lines=True) for f_name in glob('foo/*.json')])

answered Feb 18 '22 at 09:56

Anand Tripathi

14,556
1
47
52

score 2 · Answer 6 · answered Aug 07 '20 at 22:23

If turning into a pandas dataframe, use the pandas API.

More generally, you can use a generator..

def data_generator(my_path_regex):
    for filename in glob.glob(my_path_regex):
        for json_line in open(filename, 'r'):
            yield json.loads(json_line)


my_arr = [_json for _json in data_generator(my_path_regex)]

score 2 · Answer 7 · answered Nov 25 '21 at 12:00

2

I feel a solution using pathlib is missing :)

from pathlib import Path

file_list = list(Path("/path/to/json/dir").glob("*.json"))

answered Nov 25 '21 at 12:00

Original BBQ Sauce

527
1
11
24

This code never reads the files – Julien Aug 05 '22 at 22:49

score 1 · Answer 8 · answered Jun 07 '22 at 07:05

1

A simple and very easy-to-understand answer.

import os 
import glob
import pandas  as pd



path_to_json = r'\path\here'
# import all files from folder which ends with .json 
json_files = glob.glob(os.path.join(path_to_json, '*.json'))

# convert all files to datafr`enter code here`ame
df = pd.concat((pd.read_json(f) for f in json_files))
print(df.head())

answered Jun 07 '22 at 07:05

shivesh kumar

75
7

First remove the extra spaces – Julien Aug 05 '22 at 22:50

score 0 · Answer 9 · answered Dec 22 '21 at 18:15

One more option is to read it as a PySpark Dataframe and then convert it to Pandas Dataframe (if really necessary, depending on the operation I'd suggest keeping as a PySpark DF). Spark natively handles using a directory with JSON files as the main path without the need of libraries for reading or iterating over each file:

# pip install pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
spark_df = spark.read.json('/some_dir_with_json/*.json')

Next, in order to convert into a Pandas Dataframe, you can do:

df = spark_df.toPandas()

Python: Read several json files from a folder

9 Answers9

Linked

Related