I think there are two main problems here:
- The selector query seems wrong - I've tried this with
'$.users.name.text'
and found that worked for me (using Python3 and objectpath)
- The function isn't building up the list of names correctly
Try something like this instead:
import json
import objectpath
def get_names_tree(data):
tree = objectpath.Tree(data)
return tuple(tree.execute('$.users.name.text'))
def load_data(file_name):
names = []
with open(file_name) as fh:
for line in fh:
data = json.loads(line)
names.extend(get_names_tree(data))
return names
In the loop above we build up a list of names, rather than decoded entities. In your version the text_result
variable is repeatedly instantiated and only the last one is returned.
You might also be able to increase the speed by using a pure python approach to getting the data.
def get_names_careful(data):
return tuple(
name['text'] for name in
data.get('users', {}).get('name', [])
if 'text' in name
)
def get_names(data):
return tuple(name['text'] for name in data['users']['name'])
The first is careful about not raising errors with missing data, but if you know your data is always the right shape, you could try the second.
In my testing they are 15x faster (for the careful version) and 20x faster for the careless version.