1

I'm doing some scraping, but I'm looking to automate the construction of an extensive list of keywords. One approach that I've devised, which is neither convenient nor inconvenient, would be the following:

def build_search_terms():
    words1 = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
    words2 = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
    for word in words2:
        result = words1[0] + word
        words2.pop(0)
        search_for(result)

What I'm trying to do is create a function that spits out aa to az, then ba to bz, then ca to cz, so on and so forth.

Has anybody tackled this issue before?

Are there more efficient ways to do this?

oldboy
  • 5,729
  • 6
  • 38
  • 86
  • all this does is return `aa`. I think you have a typo in your post? – Red Cricket Oct 22 '18 at 02:42
  • @RedCricket every time you call it itll return something different. first time itll return `aa`, second time `ab`, so on and so forth. im wondering if there is a better way to construct keywords from scratch – oldboy Oct 22 '18 at 02:43
  • 1
    Sounds like you would want a generator. https://wiki.python.org/moin/Generators – Red Cricket Oct 22 '18 at 02:45
  • @RedCricket generators are indeed what im looking for, but im also curious as to how to build search terms of actual words from scratch – oldboy Oct 22 '18 at 02:47
  • @RedCricket in other words, what im trying to do is create a function that spits out, 'aa', 'ab', to 'az', and then 'ba', 'bb', to 'bz', then 'ca', 'cb', to 'cz', so on and so forth – oldboy Oct 22 '18 at 02:52

1 Answers1

0

You can get desired output as below:

def build_search_terms():
    words_list = []
    words = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
    for i in words:
        for j in words:
            yield i + j

and use it as

for word in build_search_terms():
    print(word)

or

def build_search_terms():
    words = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
    return (i +j for i in words for j in words)

and use it as

words = build_search_terms()
print(next(words))  # 'aa'
print(next(words))  # 'ab'
print(next(words))  # 'ac'
....
Andersson
  • 51,635
  • 17
  • 77
  • 129
  • looks pretty useful. more effective than what i'm doing now, which is using a nested loop to append the terms to a list, then looping through that list – oldboy Oct 22 '18 at 18:23
  • is there not a better way to construct the terms, like instead of having two identical arrays, isn't it possible that the list could be constructed from `aa` to `zz` be constructed from a single list? – oldboy Oct 22 '18 at 18:29
  • ill likely mark this as the answer once i test. thanks for ur help buddy <3 – oldboy Oct 22 '18 at 18:34
  • any idea how to test if a string contains non-unicode characters? – oldboy Oct 22 '18 at 18:48
  • @Anthony , check [this ticket](https://stackoverflow.com/questions/35889505/check-that-a-string-contains-only-ascii-characters/35890514) – Andersson Oct 22 '18 at 18:50
  • yeah id come across that, but when i go to test `isinstance('whatever', unicode) it said unicode is not defined. however, i did just find [this](https://stackoverflow.com/a/4987414/7543162) which basically answers my question. so basically all strings are converted to unicode in python 3? the thing is, i want to filter out/ignore hashtags that might not be unicode instead of encoding them as unicode. – oldboy Oct 22 '18 at 18:58
  • @Anthony , yes, In Python 3.x there are no problems with string encoding as `unicode` data type was abolished and now everything is `strings` or `bytes` – Andersson Oct 22 '18 at 19:08