1

I'm trying to write a code using regex and my text file. My file contains these words line by line:

each
expressions
flags
in
from
given
line
of
once
lines
no

My purpose is; displaying the words that created by removing letters from given substring.

For example; if my substring is "flamingoes", my output should be;

flags
in
line
lines
no

Because they are created from my substring by removing letters, and they are in my text file also.

I did many works about regex but I am interested about this challenge. Is there any regex solution for this?

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Wicaledon
  • 710
  • 1
  • 11
  • 26

2 Answers2

1

You should create a regex for each word you are looking for. The expression .*? between each letter is a non-greedy pattern, which will avoid backtracking (at least some of it), and make the search faster.

For example, the regex for the word "given" would be g.*?i.*?v.*?e.*?n

import re

def hidden_words(needles, haystack):
    for needle in needles:
        regex = re.compile(('.*?').join(list(needle)))
        if regex.search(haystack):
            yield needle

needles = ['each', 'expressions', 'flags', 'in', 'from', 
           'given', 'line', 'of', 'once', 'lines', 'no']

print(*hidden_words(needles, 'flamingoes'), sep='\n')
Håken Lid
  • 22,318
  • 9
  • 52
  • 67
  • it looks so beautiful and it works perfectly but I have a question about your answer. What does it mean `*` and `sep` part in print part? – Wicaledon Jun 02 '19 at 11:42
  • 1
    `sep='\n'` uses a newline as separator instead of the default single space for the `print` function.`*` is the unpacking operator in python, this will evaluate the generator and unpack the output of the generator (yield) into multiple function arguments. – Håken Lid Jun 02 '19 at 11:53
  • 1
    That's the short explanation. If you are unfamiliar with generators in python, here's a longer explanation of what generator functions are, and why they can be very useful: https://stackoverflow.com/a/1756156/1977847. – Håken Lid Jun 02 '19 at 11:59
1

Essentially each character is optional. A simple

import re
word = 'flamingoes'
pattern = ''.join( c+'?' for c in word ) # ? Marks the letter as optional

for line in open('file').readLines():
    line = line.strip()
    m = re.match(pattern, line)

    if m:
        print(line)

Should suffice

Ranga
  • 620
  • 1
  • 7
  • 9
  • Since for example "each" and "flamingoes" contain an "e", there will be a partial match. You should add filtering clause to filter only full matches `m = re.match(pattern, line); if m and m[0] == line: print(line)`. – Håken Lid Jun 02 '19 at 11:36
  • @Wicaledon: I don't want to add another answer, so I'll take the liberty to edit this one to edit this answer with my suggested filtering. – Håken Lid Jun 02 '19 at 11:45
  • @HåkenLid I also tried this answer but it doesn't print anything – Wicaledon Jun 02 '19 at 11:49
  • I forgot that you have to strip off the trailing newlines from each line when you read the words from a file. – Håken Lid Jun 02 '19 at 12:07
  • @HåkenLid now it is displaying whole words in my file – Wicaledon Jun 03 '19 at 16:51
  • @HåkenLid If we write `m = re.match('^'+pattern+'$',line)` , it works perfectly. Thank you so much – Wicaledon Jun 03 '19 at 20:18