4

Im trying to read large website data but im facing this MemoryError exception

import requests
requests.urllib3.disable_warnings()
search_page = "http://www.yachtworld.co.uk/core/listing/cache/searchResults.jsp?ps=99999"
y = requests.get(search_page, timeout=999999, stream=True)
result = y.text

I face MemoryError Exception when i try to read from result variable which is the output of the page,

Is there anyway to read the whole data without facing this exception,

Thanks.

hackerman
  • 67
  • 1
  • 7
  • Does this API let you ask for data in smaller bits (e.g.., what is that `ps=99999` thing)? Do you want to write content to disk or process it right away? If right away, is it something you could do line by line? Sometimes the answer is "buy more memory". – tdelaney Apr 29 '18 at 19:12
  • the GET paramater stand for how many rows to show in page, but i wanna to get them all in once instead of fetching on each page – hackerman Apr 29 '18 at 19:14
  • That's the "buy more memory" option. You may have luck using `lxml.html.parse("http://www.yachtworld.co.uk/core/listing/cache/searchResults.jsp?ps=99999")` or even `lxml.html.iterparse` which you can use to limit memory usage. But why not just grab and filter smaller bits? – tdelaney Apr 29 '18 at 19:49

1 Answers1

2

From what I know there has not been any changes to the problem - meaning no possibility, you can load the data in chunks like well presented here

The accepted answer from the link I provided states a quite good piece of code for chunking the response:

def download_file(url):
    local_filename = url.split('/')[-1]
    # NOTE the stream=True parameter
    r = requests.get(url, stream=True)
    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
                #f.flush() commented by recommendation from J.F.Sebastian
    return local_filename
trust512
  • 2,188
  • 1
  • 18
  • 18
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/low-quality-posts/19586366) – Bentaye Apr 29 '18 at 19:33
  • you copy paste answer from here: https://stackoverflow.com/questions/16694907/how-to-download-large-file-in-python-with-requests-py – hackerman Apr 29 '18 at 19:58
  • And I linked to the answer directly naming its copied from the accepted answer. @Bentaye stated it'd be better if I provide essential part, so I did. – trust512 Apr 29 '18 at 20:08