1

I have read the answers in this question and the next one. I have 2 huge JSON files: the first one is players_data.JSON (8GB) and the second one is match_data.JSON (80GB). The structure of data is complex, because JSON package gives a dictionary that in this case the values could be list or dictionary. Some values in dictionary of dictionaries or list of dictionaries. So the structure includes multiple nested dictionaries.

My questions are as bellow:

  1. What is the best Python package for this case to parse the JSON files.
  2. What is the best data structure to process the data. I will need to compute statistics for each player and each match (6 players)? For instance, using a dictionary with complex keys (playerId, matchId) could be an option.
Community
  • 1
  • 1
YNR
  • 867
  • 2
  • 13
  • 28
  • 1
    I don't have experience in this so I don't know if it's the best, but did you try ijson? – Alex Hall Dec 09 '16 at 16:03
  • ijson sounds like a good bet. Also look into using pandas for post-processing after you have done some parsing. Here is a tutorial https://www.dataquest.io/blog/python-json-tutorial/ – Alex G Rice Dec 09 '16 at 16:10
  • @AlexGRice I tried the following script: `with open(dataPath1) as playerData: objects = ijson.items(playerData, 'meta.view.columns.item') but I get this error: `ijson.common.JSONError: Additional data` columns = list(objects)` – YNR Dec 09 '16 at 19:00

0 Answers0