1

Found this great answer on how to check if a list of strings are within a line How to check if a line has one of the strings in a list?

But trying to do a similar thing with keys in a dict does not seem to do the job for me:

import urllib2

url_info = urllib2.urlopen('http://rss.timegenie.com/forex.xml')
currencies = {"DKK": [], "SEK": []}
print currencies.keys()
testCounter = 0

for line in url_info:
    if any(countryCode in line for countryCode in currencies.keys()):
        testCounter += 1
    if "DKK" in line or "SEK" in line:
        print line
print "testCounter is %i and should be 2 - if not debug the code" % (testCounter)

The output:

['SEK', 'DKK']
<code>DKK</code>
<code>SEK</code>
testCounter is 377 and should be 2 - if not debug the code

Think that perhaps my problem is because that .keys() gives me an array rather than a list.. But haven't figured out how to convert it..

Community
  • 1
  • 1
Norfeldt
  • 8,272
  • 23
  • 96
  • 152
  • 5
    I just ran this and testCounter was 2, not 377. I'd suggest printing out the current line whenever a match is counted if you're getting unexpected matches. – alexp Dec 28 '12 at 15:00
  • 1
    `countryCode in currencies` is functionally equivalent to `countryCode in currencies.keys()` here. – Francis Avila Dec 28 '12 at 15:05

2 Answers2

5

change:

any(countryCode in line for countryCode in currencies.keys())

to:

any([countryCode in line for countryCode in currencies.keys()])

Your original code uses a generator expression whereas (I think) your intention is a list comprehension. see: Generator Expressions vs. List Comprehension

UPDATE: I found that using an ipython interpreter with pylab imported I got the same results as you did (377 counts versus the anticipated 2). I realized the issue was that 'any' was from the numpy package which is meant to work on an array. Next, I loaded an ipython interpreter without pylab such that 'any' was from builtin. In this case your original code works. So if your using an ipython interpreter type:

help(any)

and make sure it is from the builtin module. If so your original code should work fine.

Community
  • 1
  • 1
ljk07
  • 952
  • 5
  • 13
  • There is absolutely no difference between these two except that the second one uses eager-evaluation to find all matches unnecessarily. The first one will stop after the first match since no other matches are needed to get `any` to return true. – Francis Avila Dec 28 '12 at 15:27
  • OK, you're right. Now I need to understand why the second one gives the anticipated results and the first one does not. Thanks. – ljk07 Dec 28 '12 at 15:30
  • I was using IPython ;-) Thank you so much! – Norfeldt Dec 28 '12 at 20:38
1

This is not a very good way to examine an xml file.

  1. It's slow. You are making potentially N*M substring searches where N is the number of lines and M is the number of keys.
  2. XML is not a line-oriented text format. Your substring searches could find attribute names or element names too, which is probably not what you want. And if the XML file happens to put all its elements on one line with no whitespace (common for machine-generated and -processed XML) you will get fewer matches than you expect.

If you have line-oriented text input, I suggest you construct a regex from your list of keys:

import re
linetester = re.compile('|'.join(re.escape(key) for key in currencies))

for match in linetester.finditer(entire_text):
    print match.group(0)

#or if entire_text is too long and you want to consume iteratively:

for line in entire_text:
        for match in linetester.find(line):
            print match.group(0)

However, since you have XML, you should use an actual XML processor:

import xml.etree.cElementTree as ET

for elem in forex.findall('data/code'):
    if elem.text in currencies:
        print elem.text

If you are only interested in what codes are present and don't care about the particular entry you can use set intersection:

codes = frozenset(e.text for e in forex.findall('data/code'))

print codes & frozenset(currencies)
Francis Avila
  • 31,233
  • 6
  • 58
  • 96