0

I have a list/dictionary which I'm trying to get the keys from.

Where I'm trying to print the keys is the get_keys() function, which is the part that needs fixing.

import json, io
business_json = "business.json"

def read_json(file):
    lines = [line for line in open(file)]
    js = [json.loads(line) for line in lines]
    for item in js:
        name = item.get("name")
    return js

def get_keys(data):
    for key in data.keys():
        print(key)

def get_values_for_category(data, category):
    values = []
    for item in data:
        values.append(item.get(category))
    return values

def main():
    json_data = read_json(business_json) #works
    names = get_values_for_category(json_data, "name") #works 
    get_keys(json_data)


if __name__ == "__main__":
    main()

The error I get with above's get_keys(data) is:

AttributeError: 'list' object has no attribute 'keys'

If I instead try:

for key, val in data.items():
    print(key, val)

or

for key in list(data).keys() or for key in list(data.keys()) I get same issue.

So, I have a list and need the keys. However, all the ways I've found to get a lists' keys returns an error.

Is it an issue with how I'm getting js in the read_json() function? I'm confused as to why I can use a key ("name") to get values, but can't return the various keys I can look up...

EDIT: Full traceback is:

Traceback (most recent call last):
    File "D:\Batman\Documents\- Datasets\yelp_dataset\dataset\Yelp_analysis.py", line 29, in <module>
        main()<br>
    File "D:\Batman\Documents\- Datasets\yelp_dataset\dataset\Yelp_analysis.py", line 25, in main
        get_keys(json_data)<br>
    File "D:\Batman\Documents\- Datasets\yelp_dataset\dataset\Yelp_analysis.py", line 13, in get_keys
        for key in data.keys():<br> 
AttributeError: 'list' object has no attribute 'keys'

(FWIW a summary of the .json file is here and a snippet of the data is here on PasteBin).

Bryan Zeng
  • 101
  • 2
  • 16
BruceWayne
  • 22,923
  • 15
  • 65
  • 110

1 Answers1

2

This is how you would read the file.

def read_json(file):
    return [json.loads(line) for line in open(file)]

I've seen this Dataset be asked about multiple times, and you might want to checkout the ijson library to read large JSON files. Also, I personally think those Yelp files are meant to be ran through Hadoop / Spark processes.

Anyway, now that is a list of JSON objects, which themselves have keys, not the list itself.

If you're going to do this

json_data = read_json(business_json)
get_keys(json_data)

Then the get keys function should look like so

def get_keys(data):
    for obj in data:
        print(obj.keys())
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Ooh! That `read_json()` works well! When I do the `get_keys()` you show, it's correctly showing the keys (yay!) but it's doing it for *every object* which is thousands of rows. ...how can I just show it once? I tried `for obj in data[0]: print(obj.keys())` but get `AttributeError 'str' object has no attribute 'keys'. – BruceWayne Oct 28 '17 at 00:13
  • `print(data[0].keys())`...no looping – OneCricketeer Oct 28 '17 at 00:16
  • Ah, duh! Thanks!! I'll look in to ljson too :D – BruceWayne Oct 28 '17 at 00:18