0

I have this JSON:

 {u'spreadsheetId': u'19CugmHB1Ds6n1jBy4Zo4hk_k4sQsTmOFfccxRc2qo', 
    u'properties': {u'locale': u'en_US', u'timeZone': u'Asia/Hong_Kong',
    u'autoRecalc': u'ON_CHANGE', u'defaultFormat': {u'padding': {u'top': 2, u'right': 3, u'left': 3, u'bottom': 2}, u'textFormat': {u'foregroundColor': {}, u'bold': False, u'strikethrough': False, u'fontFamily': u'arial,sans,sans-serif', u'fontSize': 10, u'italic': False, u'underline': False}, u'verticalAlignment': u'BOTTOM', u'backgroundColor': {u'blue': 1, u'green': 1, u'red': 1}, u'wrapStrategy': u'OVERFLOW_CELL'}, u'title': u'test pygsheets API V4'}, u'sheets': [{u'properties': {u'sheetType': u'GRID', u'index': 0, u'sheetId': 0, u'gridProperties': {u'columnCount': 26, u'rowCount': 1000}, u'title': u'IO'}}, {u'basicFilter': {u'range': {u'endRowIndex': 978, u'startRowIndex': 2, u'sheetId': 1704577069, u'startColumnIndex': 1, u'endColumnIndex': 9}, u'sortSpecs': [{u'sortOrder': u'ASCENDING', u'dimensionIndex': 1}, {u'sortOrder': u'ASCENDING', u'dimensionIndex': 4}, {u'sortOrder': u'ASCENDING', u'dimensionIndex': 5}, {u'sortOrder': u'ASCENDING', u'dimensionIndex': 8}, {u'sortOrder': u'ASCENDING', u'dimensionIndex': 3}, {u'sortOrder': u'ASCENDING', u'dimensionIndex': 7}, {u'sortOrder': u'ASCENDING', u'dimensionIndex': 2}]}, u'properties': {u'sheetType': u'GRID', u'index': 1, u'title': u'books', u'gridProperties': {u'columnCount': 22, u'rowCount': 978, u'frozenColumnCount': 3, u'hideGridlines': True, u'frozenRowCount': 3}, u'tabColor': {u'blue': 1}, u'sheetId': 1704577069}}], u'spreadsheetUrl': u'https://docs.google.com/spreadsheets/d/1CugmHB1Ds6n1jBy4Zo4hk_k4sQsTmOFfccxRc2qo/edit'}

How do I get the titles out of the JSON for sheets only? I want something like

input: results.get('title')

output: ['IO','books']

I'm not sure how to even approach it because of the nested structure. This reminds of a html node type structure. So i need some type of a search function?

Is there a way to get to the title nodes without looking at the structure? Kind of like a xpath search type function? I have used beautifulsoup before, and you can just not know the structure and take out parts of the data through a search.

roganjosh
  • 12,594
  • 4
  • 29
  • 46
jason
  • 3,811
  • 18
  • 92
  • 147
  • What have you tried so far? What problems are you experiencing with your code attempt? – idjaw Apr 14 '17 at 13:36
  • You should include the code that you have tried. This is no different than a nested dictionary – roganjosh Apr 14 '17 at 13:36
  • 3
    In fact, it _is_ a dictionary, not JSON. – roganjosh Apr 14 '17 at 13:37
  • 1
    Why are you excluding `IO` in your sample output? – jordanm Apr 14 '17 at 13:44
  • sorry... mistake, edited – jason Apr 14 '17 at 13:45
  • What about `test pygsheets API V4`? – Rafael Apr 14 '17 at 14:01
  • @ Rafael Martins. Crap, you're right... I gotta change the question. I want only the titles for `sheets`, `test pygsheets API V4` is the title for the entire spreadsheet. – jason Apr 14 '17 at 14:04
  • @jason what is wrong with the answer given by jordanm? If it solves your problem, you should accept it. Also, I have edited the title - I haven't found myself using "rat's arse" when searching for solutions to my programming problems. In the interest of keeping it general and useful for all on Google, it's perhaps best to avoid that kind of title. – roganjosh Apr 14 '17 at 15:08

2 Answers2

3

This will give your desired output:

print [x['properties'].get('title') for x in results['sheets']]

This returns: [u'IO', u'books']

jordanm
  • 33,009
  • 7
  • 61
  • 76
  • I would have to know `properties` is part of the structure.... can you do it without knowing `properties`? – jason Apr 14 '17 at 13:50
  • 1
    @jason It's possible, yes, but generally with JSON you will want to know an object's structure ahead of time. JSON parsers do not come with an xpath equivalent. You would need something like this (I have not used it before): https://pypi.python.org/pypi/jsonpath-rw – jordanm Apr 14 '17 at 13:53
1

This should work:

a = {your json/dict?}
print(a['properties']['title']) # prints 'test pygsheets API V4'
print(a['sheets'][0]['properties']['title']) #prints 'IO'
print(a['sheets'][1]['properties']['title']) # prints 'books'

Edit: for unknown structure:

def find_in_obj(obj, condition, path=None):

    if path is None:
        path = []

    # In case this is a list
    if isinstance(obj, list):
        for index, value in enumerate(obj):
            new_path = list(path)
            for result in find_in_obj(value, condition, path=new_path):
                yield result

    # In case this is a dictionary
    if isinstance(obj, dict):
        for key, value in obj.items():
            new_path = list(path)
            for result in find_in_obj(value, condition, path=new_path):
                yield result

            if condition == key:
                new_path = list(path)
                new_path.append(value)
                yield new_path

results = []
for item in find_in_obj(a, 'title'):
    results.append(item)
print(results) #prints [['test pygsheets API V4'], ['IO'], ['books']]

modified from: hexerei software's solution at Find all occurrences of a key in nested python dictionaries and lists

Community
  • 1
  • 1
Rafael
  • 3,096
  • 1
  • 23
  • 61