0

Use case: Need to check if JSON data from a url has been updated by checking it's created_date field which lies in the first few lines. The entire page's JSON data is huge and i don't want to retrieve the entire page just to check the first few lines.

Currently, For both

x=feedparser.parse(url)
y=requests.get(url).text

#y.split("\n") etc..

the entire url data is retrieved and then parsed.

I want to do some sort of next(url) or reading only first 10 lines (chunks).. thus not sending request for entire page's data...i.e just scroll & check 'created_date' field and exit.

What can be utilized to solve this? Thanks for your knowledge & Apologies for the noob q

Example of URL -> https://www.w3schools.com/xml/plant_catalog.xml

I want to stop reading the entire URL data if the first PLANT object's LIGHT tag hadn't changed from 'Mostly Shady' (without needing to read/get the data below)

  • Maybe check the `stream=True` when you use `requests.get()`. https://stackoverflow.com/questions/57497833/python-requests-stream-data-from-api – Andrej Kesely Jul 04 '22 at 18:43
  • Can you share the URL? Maybe we can alter the get request to clean up the returned JSON? – Rivered Jul 04 '22 at 18:45
  • Is it possible there's a `Last-Modified` header in the response such that you could do a `HEAD` on the same resource instead of streaming any of the body data at all? – esqew Jul 04 '22 at 18:49
  • "*Is there a library or algo that can help?*" Questions seeking library recommendations are explicitly off-topic on Stack Overflow per the scope of the site defined in the [help/on-topic]. – esqew Jul 04 '22 at 18:50
  • @Rivered example url https://www.w3schools.com/xml/plant_catalog.xml added in description. i want to limit the read JSON to only a specific tag OR number of characters in the chunk of json string i 'got' – Shivam Anand Jul 04 '22 at 18:51
  • @esqew will explore this avenue, thanks. editing it out. – Shivam Anand Jul 04 '22 at 18:52
  • 2
    Does this answer your question? [How I can I lazily read multiple JSON values from a file/stream in Python?](https://stackoverflow.com/questions/6886283/how-i-can-i-lazily-read-multiple-json-values-from-a-file-stream-in-python) – buran Jul 04 '22 at 18:53
  • The webpage example returns xml content, not json... – Rivered Jul 04 '22 at 19:00
  • @Rivered Unsure if i can share the official link hence found an equivalent example - json/xml both are data types encountered for this problem (multiple links) – Shivam Anand Jul 04 '22 at 19:04
  • You could try to modify the GET request into a HEAD request to return only relevant metadata – Rivered Jul 04 '22 at 19:06
  • @Rivered Thanks a lot. It had last modified date. You could add an answer so i can accept it – Shivam Anand Jul 05 '22 at 06:03
  • @AndrejKesely this was perfect and very informative. even after i did regex to select specific tag's value, it was faster than request.head... – Shivam Anand Jul 06 '22 at 05:21

1 Answers1

-1

Original poster stated below solution worked:

Instead of GET request, one can try HEAD request:

"The GET method requests a representation of the specified resource. Requests using GET should only retrieve data. The HEAD method asks for a response identical to a GET request, but without the response body."

This way, you don't need to request entire JSON, and will therefore speed up the server side part, as well as be more friendly to the hosting server!

Rivered
  • 741
  • 7
  • 27