python: retrieve a section of JSON data, without knowing the structure

Question

I have this JSON:

 {u'spreadsheetId': u'19CugmHB1Ds6n1jBy4Zo4hk_k4sQsTmOFfccxRc2qo', 
    u'properties': {u'locale': u'en_US', u'timeZone': u'Asia/Hong_Kong',
    u'autoRecalc': u'ON_CHANGE', u'defaultFormat': {u'padding': {u'top': 2, u'right': 3, u'left': 3, u'bottom': 2}, u'textFormat': {u'foregroundColor': {}, u'bold': False, u'strikethrough': False, u'fontFamily': u'arial,sans,sans-serif', u'fontSize': 10, u'italic': False, u'underline': False}, u'verticalAlignment': u'BOTTOM', u'backgroundColor': {u'blue': 1, u'green': 1, u'red': 1}, u'wrapStrategy': u'OVERFLOW_CELL'}, u'title': u'test pygsheets API V4'}, u'sheets': [{u'properties': {u'sheetType': u'GRID', u'index': 0, u'sheetId': 0, u'gridProperties': {u'columnCount': 26, u'rowCount': 1000}, u'title': u'IO'}}, {u'basicFilter': {u'range': {u'endRowIndex': 978, u'startRowIndex': 2, u'sheetId': 1704577069, u'startColumnIndex': 1, u'endColumnIndex': 9}, u'sortSpecs': [{u'sortOrder': u'ASCENDING', u'dimensionIndex': 1}, {u'sortOrder': u'ASCENDING', u'dimensionIndex': 4}, {u'sortOrder': u'ASCENDING', u'dimensionIndex': 5}, {u'sortOrder': u'ASCENDING', u'dimensionIndex': 8}, {u'sortOrder': u'ASCENDING', u'dimensionIndex': 3}, {u'sortOrder': u'ASCENDING', u'dimensionIndex': 7}, {u'sortOrder': u'ASCENDING', u'dimensionIndex': 2}]}, u'properties': {u'sheetType': u'GRID', u'index': 1, u'title': u'books', u'gridProperties': {u'columnCount': 22, u'rowCount': 978, u'frozenColumnCount': 3, u'hideGridlines': True, u'frozenRowCount': 3}, u'tabColor': {u'blue': 1}, u'sheetId': 1704577069}}], u'spreadsheetUrl': u'https://docs.google.com/spreadsheets/d/1CugmHB1Ds6n1jBy4Zo4hk_k4sQsTmOFfccxRc2qo/edit'}

How do I get the titles out of the JSON for sheets only? I want something like

input: results.get('title')

output: ['IO','books']

I'm not sure how to even approach it because of the nested structure. This reminds of a html node type structure. So i need some type of a search function?

Is there a way to get to the title nodes without looking at the structure? Kind of like a xpath search type function? I have used beautifulsoup before, and you can just not know the structure and take out parts of the data through a search.

What have you tried so far? What problems are you experiencing with your code attempt? — idjaw, Apr 14 '17 at 13:36
You should include the code that you have tried. This is no different than a nested dictionary — roganjosh, Apr 14 '17 at 13:36
@ Rafael Martins. Crap, you're right... I gotta change the question. I want only the titles for `sheets`, `test pygsheets API V4` is the title for the entire spreadsheet. — jason, Apr 14 '17 at 14:04
@jason what is wrong with the answer given by jordanm? If it solves your problem, you should accept it. Also, I have edited the title - I haven't found myself using "rat's arse" when searching for solutions to my programming problems. In the interest of keeping it general and useful for all on Google, it's perhaps best to avoid that kind of title. — roganjosh, Apr 14 '17 at 15:08

score 3 · Accepted Answer · answered Apr 14 '17 at 13:45

3

This will give your desired output:

print [x['properties'].get('title') for x in results['sheets']]

This returns: [u'IO', u'books']

answered Apr 14 '17 at 13:45

jordanm

33,009
7
61
76

I would have to know `properties` is part of the structure.... can you do it without knowing `properties`? – jason Apr 14 '17 at 13:50
1

@jason It's possible, yes, but generally with JSON you will want to know an object's structure ahead of time. JSON parsers do not come with an xpath equivalent. You would need something like this (I have not used it before): https://pypi.python.org/pypi/jsonpath-rw – jordanm Apr 14 '17 at 13:53

score 1 · Answer 2 · edited May 23 '17 at 12:09

1

This should work:

a = {your json/dict?}
print(a['properties']['title']) # prints 'test pygsheets API V4'
print(a['sheets'][0]['properties']['title']) #prints 'IO'
print(a['sheets'][1]['properties']['title']) # prints 'books'

Edit: for unknown structure:

def find_in_obj(obj, condition, path=None):

    if path is None:
        path = []

    # In case this is a list
    if isinstance(obj, list):
        for index, value in enumerate(obj):
            new_path = list(path)
            for result in find_in_obj(value, condition, path=new_path):
                yield result

    # In case this is a dictionary
    if isinstance(obj, dict):
        for key, value in obj.items():
            new_path = list(path)
            for result in find_in_obj(value, condition, path=new_path):
                yield result

            if condition == key:
                new_path = list(path)
                new_path.append(value)
                yield new_path

results = []
for item in find_in_obj(a, 'title'):
    results.append(item)
print(results) #prints [['test pygsheets API V4'], ['IO'], ['books']]

modified from: hexerei software's solution at Find all occurrences of a key in nested python dictionaries and lists

edited May 23 '17 at 12:09

Community

1
1

answered Apr 14 '17 at 13:40

Rafael

3,096
1
23
61

There are multiple `title` keys in there. – roganjosh Apr 14 '17 at 13:41
That's if I know the structure... what if i don't want to look at the structure and just want to search through `titles`? i'm looking for a generic solution. – jason Apr 14 '17 at 13:44
Your edit has hard-coded a solution. You need some kind of `for` loop or a whole lot of patience with typing if the dictionary is large. – roganjosh Apr 14 '17 at 13:49
1

@roganjosh you're saying your editor doesn't have macro support? :P – jordanm Apr 14 '17 at 13:54
I added a recursive solution – Rafael Apr 14 '17 at 13:58

python: retrieve a section of JSON data, without knowing the structure

2 Answers2