2

I want to use the twint tool in Python to search for tweets containing all possible spellings of the word Ethiopia including exaggerations, such as ETHIOOOPIAAAA and ethiopiaaaa. So far I have tried to create a search term that is the string f"e{eth}a" where eth is a string of random length between 0-18 chars starting with e, ending with a and has a random order of characters in the middle, of which the characters are limited to e,t,h,i,o,p,a.

I have tried to use this:

import random

eth_chars = "ethiopa"
eth = ""

for i in range(0,18):
    eth += random.choice(eth_chars)

search_term = f"e{eth}a"

This does not work since it assigns one generated string to search_term and searches for that single term instead, but I want to search for all possible strings of any length 0-18 char. that follow this rule:

e-(random order of e,t,h,i,o,p,a)-a

Also, I need to make the queries case insensitive. I tried to add the .casefold() string method to the search query when configuring twint like this 'config.Search = search_term.casefold()', assuming this would simply read the string and ignore the case. I am not sure this will work.

Any assistance will be appreciated.

Jinx
  • 96
  • 6

1 Answers1

0

You can use itertools.product to compute the cartesian product of lists of values, eg get all the possible combinations.

if you want to keep the letter order, you would need to generate an array of combinations like this :

[['e', 'E', 'ee', 'eE', 'Ee', 'EE'], ['t', 'T', 'tt', 'tT', 'Tt', 'TT'], ['h', 'H', 'hh', 'hH', 'Hh', 'HH'], ['i', 'I', 'ii', 'iI', 'Ii', 'II'], ['o', 'O', 'oo', 'oO', 'Oo', 'OO'], ['p', 'P', 'pp', 'pP', 'Pp', 'PP'], ['i', 'I', 'ii', 'iI', 'Ii', 'II'], ['a', 'A', 'aa', 'aA', 'Aa', 'AA']]

Then process all possible combinations using itertools.product :

import itertools

eth_chars = list("ethiopia")
max_length = 2

combinations = []
for idx,character in enumerate(eth_chars):
    char_arr= [] # ['ee','e','E','EE','eE','Ee']

    char_arr.append(eth_chars[idx]) #add 'e'
    char_arr.append(eth_chars[idx].upper()) #add 'E'
    for item in itertools.product(eth_chars[idx] + eth_chars[idx].upper(), repeat=max_length):
        char_arr.append("".join(list(item))) #add any combination 'eE', 'EE', 'Ee'

    combinations.append(char_arr) # [['ee','e','E','EE','eE','Ee']]

print(combinations)
for element in itertools.product(*combinations):
    print("".join(element))

Try this on repl.it

Note that :

  • the method above assumes you want the letters in the right order, not dealing with permutations
  • just with max_length=2, it returns 1 679 616 combinations, if you remove the uppercase combinations (eE and Ee), you would have 65 536 combinations like this :
import itertools

eth_chars = list("ethiopia")
max_length = 2

combinations = []
for idx,character in enumerate(eth_chars):
    char_arr= [] # ['ee','e','E','EE','eE']
    for count in range(0,max_length):
        char_arr.append(eth_chars[idx]*(count+1))
        char_arr.append(eth_chars[idx].upper()*(count+1))
    combinations.append(char_arr) # [['ee','e','E','EE']]

for element in itertools.product(*combinations):
    print("".join(element))
Bertrand Martel
  • 42,756
  • 16
  • 135
  • 159
  • Tried the first method again with `max length = 4` and that's exactly what I was looking for in terms of all possible outputs. Thank you. Now lemme try the search method and see if it works on [the same repl.it](https://replit.com/@bertrandmartel/CartesianProductInOrder). – Jinx Apr 04 '21 at 15:26
  • UPDATE: I made a [fork of your repl.it](https://replit.com/@wayneotweezy/CartesianProductInOrder#main.py) with comments. Check it out and see if I've got the right idea. – Jinx Apr 04 '21 at 15:49