Find the Words that Created by Removing Letters from Given String

Question

I'm trying to write a code using regex and my text file. My file contains these words line by line:

each
expressions
flags
in
from
given
line
of
once
lines
no

My purpose is; displaying the words that created by removing letters from given substring.

For example; if my substring is "flamingoes", my output should be;

flags
in
line
lines
no

Because they are created from my substring by removing letters, and they are in my text file also.

I did many works about regex but I am interested about this challenge. Is there any regex solution for this?

@TimBiegeleisen I wrote "I did many works about regex" on my question to be not to look lazy for answers like yours. This is just a question I wonder. — Wicaledon, Jun 02 '19 at 11:03
@The_fourth_bird of is not a match because order has to be preserved — Ranga, Jun 02 '19 at 11:24

Håken Lid · Accepted Answer · 2019-06-02T11:27:42.190

1

You should create a regex for each word you are looking for. The expression .*? between each letter is a non-greedy pattern, which will avoid backtracking (at least some of it), and make the search faster.

For example, the regex for the word "given" would be g.*?i.*?v.*?e.*?n

import re

def hidden_words(needles, haystack):
    for needle in needles:
        regex = re.compile(('.*?').join(list(needle)))
        if regex.search(haystack):
            yield needle

needles = ['each', 'expressions', 'flags', 'in', 'from', 
           'given', 'line', 'of', 'once', 'lines', 'no']

print(*hidden_words(needles, 'flamingoes'), sep='\n')

edited Jun 02 '19 at 11:27

answered Jun 02 '19 at 11:21

Håken Lid

22,318
9
52
67

it looks so beautiful and it works perfectly but I have a question about your answer. What does it mean `*` and `sep` part in print part? – Wicaledon Jun 02 '19 at 11:42
1

`sep='\n'` uses a newline as separator instead of the default single space for the `print` function.`*` is the unpacking operator in python, this will evaluate the generator and unpack the output of the generator (yield) into multiple function arguments. – Håken Lid Jun 02 '19 at 11:53
1

That's the short explanation. If you are unfamiliar with generators in python, here's a longer explanation of what generator functions are, and why they can be very useful: https://stackoverflow.com/a/1756156/1977847. – Håken Lid Jun 02 '19 at 11:59

Ranga · Answer 2 · 2019-06-02T22:29:01.850

1

Essentially each character is optional. A simple

import re
word = 'flamingoes'
pattern = ''.join( c+'?' for c in word ) # ? Marks the letter as optional

for line in open('file').readLines():
    line = line.strip()
    m = re.match(pattern, line)

    if m:
        print(line)

Should suffice

edited Jun 02 '19 at 22:29

answered Jun 02 '19 at 11:30

Ranga

620
1
7
9

Since for example "each" and "flamingoes" contain an "e", there will be a partial match. You should add filtering clause to filter only full matches `m = re.match(pattern, line); if m and m[0] == line: print(line)`. – Håken Lid Jun 02 '19 at 11:36
@Wicaledon: I don't want to add another answer, so I'll take the liberty to edit this one to edit this answer with my suggested filtering. – Håken Lid Jun 02 '19 at 11:45
@HåkenLid I also tried this answer but it doesn't print anything – Wicaledon Jun 02 '19 at 11:49
I forgot that you have to strip off the trailing newlines from each line when you read the words from a file. – Håken Lid Jun 02 '19 at 12:07
@HåkenLid now it is displaying whole words in my file – Wicaledon Jun 03 '19 at 16:51
@HåkenLid If we write `m = re.match('^'+pattern+'$',line)` , it works perfectly. Thank you so much – Wicaledon Jun 03 '19 at 20:18

Find the Words that Created by Removing Letters from Given String

2 Answers2