find all words in a certain alphabet with multi character letters

Question

I want to find out what words can be formed using the names of musical notes.

This question is very similar: Python code that will find words made out of specific letters. Any subset of the letters could be used But my alphabet also contains "fis","cis" and so on.

letters = ["c","d","e","f","g","a","h","c","fis","cis","dis"]

I have a really long word list with one word per list and want to use

with open(...) as f:
for line in f:
    if

to check if each word is part of that "language" and then save it to another file.

my problem is how to alter

>>> import re
>>> m = re.compile('^[abilrstu]+$')
>>> m.match('australia') is not None
True
>>> m.match('dummy') is not None
False
>>> m.match('australian') is not None
False

so it also matches with "fis","cis" and so on.

e.g. "fish" is a match but "ifsh" is not a match.

Check is this you wanna do `(^|\b)([cdefgahc]fis|cis|dis)+(\b|$)` https://regex101.com/r/RzYRIs/1/ — Code Maniac, Jan 27 '19 at 17:28
@ParitoshSingh Not necessarily: the German system (which is obviously the one in use here), calls H what in English-speaking countries is called B, and uses B for what in English-speaking countries is called B-flat. — BoarGules, Jan 27 '19 at 18:04

score 3 · Answer 1 · edited Jun 20 '20 at 09:12

This function works, it doesn't use any external libraries:

def func(word, letters):
    for l in sorted(letters, key=lambda x: x.length, reverse=True):
        word = word.replace(l, "")
    return not s

it works because if s=="", then it has been decomposed into your letters.

Update:

It seems that my explanation wasn't clear. WORD.replace(LETTER, "") will replace the note/LETTER in WORD by nothing, here is an example :

func("banana", {'na'})

it will replace every 'na' in "banana" by nothing ('')

the result after this is "ba", which is not a note

not "" means True and not "ba" is false, this is syntactic sugar.

here is another example :

func("banana", {'na', 'chicken', 'b', 'ba'})

it will replace every 'chicken' in "banana" by nothing ('')

the result after this is "banana"

it will replace every 'ba' in "banana" by nothing ('')

the result after this is "nana"

it will replace every 'na' in "nana" by nothing ('')

the result after this is ""

it will replace every 'b' in "" by nothing ('')

the result after this is ""

not "" is True ==> HURRAY IT IS A MELODY !

note: The reason for the sorted by length is because otherwise, the second example would not have worked. The result after deleting "b" would be "a", which can't be decomposed in notes.

I dont quite understand, can you please elaborate how this code works? — Nivatius, Jan 27 '19 at 19:23
even though stackoverflow doesn't want me to use the comments for this: thank you for taking the time to explain how it works and providing a non regex solution. I accepted the other one because I asked in the question how to alter the regex, so your answer strictly an answer to my question. still upvote because it is an interesting alternative. — Nivatius, Jan 29 '19 at 16:21
And I thank you for asking when you didn't understand, the objective of SO is to provide answers and to learn. I am glad I accomplished those objectives. — Benoît P, Jan 29 '19 at 18:13

Slam · Accepted Answer · 2019-01-27T20:23:26.380

3

I believe ^(fis|cis|dis|[abcfhg])+$ will do the job.

Some deconstruction of what's going on here:

| workds like OR conjunction
[...] denotes "any symbol from what's inside the brackets"
^ and $ stand for beginning and end of line, respectively
+ stands for "1 or more time"
( ... ) stands for grouping, needed to apply +/*/{} modifiers. Without grouping such modifiers applies to closest left expression

Alltogether this "reads" as "whole string is one or more repetition of fis/cis/dis or one of abcfhg"

edited Jan 27 '19 at 20:23

answered Jan 27 '19 at 17:40

Slam

8,112
1
36
44

1

You probably want + rather than * – Mad Physicist Jan 27 '19 at 17:43
Well, I think that empty string may be considered as line made of any characters ;) But in real case with matching against corpus this shouldn't make any difference – Slam Jan 27 '19 at 17:45
worked for me with a + at the end. please do elaborate the code a bit as @DebanjanB pointed out. you are welcome to use my code that I put in the question – Nivatius Jan 27 '19 at 19:26

Mykola Zotko · Answer 3 · 2019-01-28T08:35:10.057

You can calculate the number of letters of all units (names of musical notes), which are in the word, and compare this number to the length of the word.

from collections import Counter

units = {"c","d","e","f","g","a","h", "fis","cis","dis"}

def func(word, units=units):
    letters_count = Counter()
    for unit in units:
        num_of_units = word.count(unit)
        letters_count[unit] += num_of_units * len(unit) 
        if len(unit) == 1:
            continue
        # if the unit consists of more than 1 letter (e.g. dis)
        # check if these letters are in one letter units
        # if yes, substruct the number of repeating letters
        for letter in unit:
            if letter in units:
                letters_count[letter] -= num_of_units
    return len(word) == sum(letters_count.values())

print(func('disc'))
print(func('disco'))    
# True
# False

score 0 · Answer 4 · answered Dec 27 '22 at 12:53

A solution with tkinter window opening to choose file:

import re
from tkinter import filedialog as fd

m = re.compile('^(fis|ges|gis|as|ais|cis|des|es|dis|[abcfhg])+$')
matches = list()
filename = fd.askopenfilename()


with open(filename) as f:
    for line in f:
        if m.match(str(line).lower()) is not None:
            matches.append(line[:-1])


print(matches)

_{This answer was posted as an edit to the question find all words in a certain alphabet with multi character letters by the OP Nivatius under CC BY-SA 4.0.}

find all words in a certain alphabet with multi character letters

4 Answers4

Update: