Python download large xml file from a url to get first 10 entries

Question

I have xml file of the client shared via a url and I want to download only the first 10 entries from the file instead of downloading the whole file.

I know how to download a xml file in python, but instead of downloading the whole file, I just want to download first 10 entries of the xml.

import requests
URL = "http://clientfeed.com/feed/feed.xml"
response = requests.get(URL,stream=True)

But here I don't know how to proceed to download only 10 entries from the file

Sample Tree node of a xml

 /products  ---Root element
 /products/product  --Repeat element
 /products/product/id
 /products/product/name
 /products/product/producturl
 /products/product/bigimage
 /products/product/price
 /products/product/instock
 /products/product/category

Here first 10 entries of /products/product needs to be downloaded instead of the whole file.

Sample xml file having first four entries

<?xml version="1.0"?>
<products>
<product>
<id>1212</id>
<name>product name</name>
<producturl>product url</producturl>
<bigimage>image url</bigimage>
<price>11323</price>
<instock>yes</instock>
<category>cate</category>
</product>
<product>
<id>35345</id>
<name>product name</name>
<producturl>product url</producturl>
<bigimage>image url</bigimage>
<price>11323</price>
<instock>yes</instock>
<category>cate</category>
</product>
<product>
<id>7656756</id>
<name>product name</name>
<producturl>product url</producturl>
<bigimage>image url</bigimage>
<price>11323</price>
<instock>yes</instock>
<category>cate</category>
</product>
<product>
<id>575686786</id>
<name>product name</name>
<producturl>product url</producturl>
<bigimage>image url</bigimage>
<price>11323</price>
<instock>yes</instock>
<category>cate</category>
</product>
</products>

Can someone guide me here how to achieve this?

Thanks in advance

Yours client API does not support GET request with ID as parameter in URL? — K.Maj, Feb 26 '19 at 12:36
Can you show actual xml file instead of describing it's format — Alderven, Feb 26 '19 at 12:37
@K.Maj No client doesn't have that option, client just share whole file. — chethi, Feb 26 '19 at 12:39
Possible duplicate of [Only download a part of the document using python requests](https://stackoverflow.com/questions/23602412/only-download-a-part-of-the-document-using-python-requests) — Jonah Bishop, Feb 26 '19 at 12:52
Can you also show what data do you need from that sample xml? — Alderven, Feb 26 '19 at 12:53
@Alderven, above xml is having only 4 entries, ie 4 s, i have big file which conatins 10000 entries, so in this i want only the first 10 entries of s, so i will end of only reading first ten entries instead of all entries. — chethi, Feb 26 '19 at 12:58
@Chethi: Use the same as in your last question: [download-large-csv-file-from-a-url-line-by-line-for-only-10-entries](https://stackoverflow.com/questions/53815346/python-download-large-csv-file-from-a-url-line-by-line-for-only-10-entries) instead of counting rows, count closing tags. — stovfl, Feb 26 '19 at 13:00
@stovfl, how can we do, as i want first ten s , how to do from which tag i need to find entries and their correspoding closing tags — chethi, Feb 26 '19 at 13:06
@Chethi: `lxml` => `.iterparse()` => count `event == 'end'` => `break`. Read [event-driven-parsing](https://lxml.de/tutorial.html#event-driven-parsing) — stovfl, Feb 26 '19 at 13:13

score 1 · Accepted Answer · answered Feb 26 '19 at 13:19

Here is a code snippet that does what you are looking for.
Note that the data to download is different and the word to look for is 'name'.
In your case you should count 'product'

import requests

URL = "http://ftp.acc.umu.se/mirror/wikimedia.org/dumps/aawiki/20190101/dumpruninfo.txt"


MAX_COUNT = 5
file_data = ''
with requests.get(URL, stream=True) as r:
    for chunk in r.iter_content(chunk_size=50):
        file_data += chunk.decode('utf-8')
        count = file_data.count('name')
        if count >= MAX_COUNT:
            print(file_data)
            break

Use higher version of requests lib. The code I have shared worked under python 3.7 and requests 2.21 — balderman, Feb 26 '19 at 14:15

Python download large xml file from a url to get first 10 entries

1 Answers1