Pulling data from xml page into .txt

Question

Im trying to pull just the keywords from an xml output like shown on:

http://clients1.google.com/complete/search?hl=en&output=toolbar&q=test+a

I have tried putting together the below but i don't seem to get any errors or any output. Any ideas?

import urllib2 as ur
import re

f = ur.urlopen(u'http://clients1.google.com/complete/search?hl=en&output=toolbar&q=test+a')
res = f.readlines()
for d in res:
  data = re.findall('<CompleteSuggestion><\/CompleteSuggestion>',d)
  for i in data:
    print i
    file = open("keywords.txt", "a")
    file.write(i + '\n')
    file.close()

I am trying to,

Fetch the xml from url given
Store list of keywords from XML file, parsed using regex

Thanks,

Did you check that the regex in findall works correctly (by setting some constant content into 'd') ?
Also. try adding r before the regex string, e.g r'<\/CompleteSuggestion>') — Baruch Oxman, Jun 03 '15 at 14:17
Hey Baruch, Im not that great at Regex. I'm guessing i did somthing wrong within the regex itself. — BubblewrapBeast, Jun 03 '15 at 14:22
You should use one of the numerous XML libraries included in the Python standard library. — Iguananaut, Jun 03 '15 at 14:30
possible duplicate of [How do I parse XML in Python?](http://stackoverflow.com/questions/1912434/how-do-i-parse-xml-in-python) — Iguananaut, Jun 03 '15 at 14:31
(As an aside, you don't need to open the file and close it on every loop. Just open it once before your loops, write to it in the loop, and close it after all writing is finished) — Iguananaut, Jun 03 '15 at 14:33
It is not clear about what are you trying to extract here. Can you post expected output for a sample input xml? — Gurupad Hegde, Jun 03 '15 at 14:36
So i am looking to have python go to:http://clients1.google.com/complete/search?hl=en&output=toolbar&q=test+a and give me the text such as: test anxiety, test america — BubblewrapBeast, Jun 03 '15 at 14:44

Gurupad Hegde · Answer 1 · 2015-06-03T15:03:36.463

from urllib2 import urlopen 
import re

xml_url = u'http://clients1.google.com/complete/search?hl=en&output=toolbar&q=test+a'
xml_file_contents = urlopen(xml_url).readlines()

keywords_file = open("keywords.txt", "a")

for entry in xml_file_contents:
    output = "\n".join(re.findall('data=\"([^\"]*)',entry))
    print output
    keywords_file.write(output + '\n')

keywords_file.close()

output:

test anxiety
test america
test adobe flash
test automation
test act
test alternator
test and set
test adblock
test adobe shockwave
test automation tools

Let me know in case of any doubt

Pulling data from xml page into .txt

1 Answers1