How to Read Just 10-20 Few Lines from Rather Big JSON File

Question

I'd like to challenge to this Vision-Dialog Competition introduced on the following link:

https://visualdialog.org/challenge/2018

The link provides publicly open train dataset which has around 300MB of size and its format is JSON.

I just want to read few lines of this file so that I could just check which dialogues have been rounded between human and AI.

Is there any good way just to read few lines of JSON file using viewer or using python interaction shell- such as jupyter? Any medium is just fine.

--

I had tired mpu but it returns following error:

--------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-17-dfebad8ac65a> in <module>()
----> 1 data = mpu.io.read("C:/Users/syyun/Downloads/visdial_0.9_train/visdial_0.9_train.json")

c:\users\syyun\appdata\local\programs\python\python36-32\lib\site-packages\mpu\io.py in read(filepath, **kwargs)
     83     elif filepath.lower().endswith('.json'):
     84         with open(filepath) as data_file:
---> 85             data = json.load(data_file, **kwargs)
     86         return data
     87     elif filepath.lower().endswith('.pickle'):

c:\users\syyun\appdata\local\programs\python\python36-32\lib\json\__init__.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    294 
    295     """
--> 296     return loads(fp.read(),
    297         cls=cls, object_hook=object_hook,
    298         parse_float=parse_float, parse_int=parse_int,

downloadc:\users\syyun\appdata\local\programs\python\python36-32\lib\encodings\cp1252.py in decode(self, input, final)
     21 class IncrementalDecoder(codecs.IncrementalDecoder):
     22     def decode(self, input, final=False):
---> 23         return codecs.charmap_decode(input,self.errors,decoding_table)[0]
     24 
     25 class StreamWriter(Codec,codecs.StreamWriter):

What is the problem in reading 300mb of json? Did you try it? (`pip install mpu` and then `import mpu.io; data = mpu.io.read("your path.json")`). I guess reading it takes less time than writing this question. — Martin Thoma, Jun 08 '18 at 06:35
I followed your link and the training set is actually a zip-file: https://s3.amazonaws.com/visual-dialog/v0.9/visdial_0.9_train.zip. How about you download it, load it to memory and just pick the subset you are interested in? 300mb is not that much. But hey, this is my opinion. — Anton vBR, Jun 08 '18 at 06:37
@MartinThoma thx for your comment. but it returns memory error.. — DonataBersick, Jun 08 '18 at 06:53
How much memory do you have? Did you try https://stackoverflow.com/a/10382359/562769 ? (I'll have a look at it when I'm at home... remind me, if I don't post something in ~15h) — Martin Thoma, Jun 08 '18 at 06:57
Reading the `visdial_0.9_val.json` with `mpu` takes ~5GB peak on my machine. And it works — Martin Thoma, Jun 10 '18 at 20:11
Reading `visdial_0.9_train.json`, my system consumed ~6GB at peak. Please note that I'm speaking of everything, not only the pythoh script (e.g. I have a game and the browser running as well) — Martin Thoma, Jun 11 '18 at 04:49

How to Read Just 10-20 Few Lines from Rather Big JSON File

0 Answers0