1

I have xml file of the client shared via a url and I want to download only the first 10 entries from the file instead of downloading the whole file.

I know how to download a xml file in python, but instead of downloading the whole file, I just want to download first 10 entries of the xml.

import requests
URL = "http://clientfeed.com/feed/feed.xml"
response = requests.get(URL,stream=True) 

But here I don't know how to proceed to download only 10 entries from the file

Sample Tree node of a xml

 /products  ---Root element
 /products/product  --Repeat element
 /products/product/id
 /products/product/name
 /products/product/producturl
 /products/product/bigimage
 /products/product/price
 /products/product/instock
 /products/product/category

Here first 10 entries of /products/product needs to be downloaded instead of the whole file.

Sample xml file having first four entries

<?xml version="1.0"?>
<products>
<product>
<id>1212</id>
<name>product name</name>
<producturl>product url</producturl>
<bigimage>image url</bigimage>
<price>11323</price>
<instock>yes</instock>
<category>cate</category>
</product>
<product>
<id>35345</id>
<name>product name</name>
<producturl>product url</producturl>
<bigimage>image url</bigimage>
<price>11323</price>
<instock>yes</instock>
<category>cate</category>
</product>
<product>
<id>7656756</id>
<name>product name</name>
<producturl>product url</producturl>
<bigimage>image url</bigimage>
<price>11323</price>
<instock>yes</instock>
<category>cate</category>
</product>
<product>
<id>575686786</id>
<name>product name</name>
<producturl>product url</producturl>
<bigimage>image url</bigimage>
<price>11323</price>
<instock>yes</instock>
<category>cate</category>
</product>
</products>

Can someone guide me here how to achieve this?

Thanks in advance

chethi
  • 699
  • 2
  • 7
  • 23
  • Yours client API does not support GET request with ID as parameter in URL? – K.Maj Feb 26 '19 at 12:36
  • Can you show actual xml file instead of describing it's format – Alderven Feb 26 '19 at 12:37
  • @K.Maj No client doesn't have that option, client just share whole file. – chethi Feb 26 '19 at 12:39
  • @Alderven Added sample xml file having 4 entries – chethi Feb 26 '19 at 12:51
  • 1
    Possible duplicate of [Only download a part of the document using python requests](https://stackoverflow.com/questions/23602412/only-download-a-part-of-the-document-using-python-requests) – Jonah Bishop Feb 26 '19 at 12:52
  • Can you also show what data do you need from that sample xml? – Alderven Feb 26 '19 at 12:53
  • @Alderven, above xml is having only 4 entries, ie 4 s, i have big file which conatins 10000 entries, so in this i want only the first 10 entries of s, so i will end of only reading first ten entries instead of all entries. – chethi Feb 26 '19 at 12:58
  • @JonahBishop im not sure how it will helpful for this case – chethi Feb 26 '19 at 12:59
  • @Chethi: Use the same as in your last question: [download-large-csv-file-from-a-url-line-by-line-for-only-10-entries](https://stackoverflow.com/questions/53815346/python-download-large-csv-file-from-a-url-line-by-line-for-only-10-entries) instead of counting rows, count closing tags. – stovfl Feb 26 '19 at 13:00
  • @stovfl, how can we do, as i want first ten s , how to do from which tag i need to find entries and their correspoding closing tags – chethi Feb 26 '19 at 13:06
  • 2
    @Chethi: `lxml` => `.iterparse()` => count `event == 'end'` => `break`. Read [event-driven-parsing](https://lxml.de/tutorial.html#event-driven-parsing) – stovfl Feb 26 '19 at 13:13

1 Answers1

1

Here is a code snippet that does what you are looking for.
Note that the data to download is different and the word to look for is 'name'.
In your case you should count 'product'

import requests

URL = "http://ftp.acc.umu.se/mirror/wikimedia.org/dumps/aawiki/20190101/dumpruninfo.txt"


MAX_COUNT = 5
file_data = ''
with requests.get(URL, stream=True) as r:
    for chunk in r.iter_content(chunk_size=50):
        file_data += chunk.decode('utf-8')
        count = file_data.count('name')
        if count >= MAX_COUNT:
            print(file_data)
            break
balderman
  • 22,927
  • 7
  • 34
  • 52