0

Need help getting the above words (ZYGOMA, ZOMA, ZYGMA) after the match n. m.(noun masculine) and n. f.(noun feminine) is found. I've tried different flags like multiline and dotall but still no luck getting the main words above. Any help will be greatly appreciated

import re


def main():
    mytext = open("m.txt")
    mypattern = re.compile('n. (m.|f.)')
    for line in mytext:
        match = re.search(mypattern, line)
        if match:
            print(match.group())

if __name__ == "__main__":
    main()

The text i'm using as a sample is:

ZYGOMA

n. m. T. d'Anatomie . Os de la pommette de la joue.

ZOMA

n. m. T. d'Anatomie . Os de la pommette de la joue.

ZYGMA

n. m. T. d'Anatomie . Os de la pommette de la joue.

How the main file i'll parse looks like this:

How the main file i'll parse looks like this

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91

1 Answers1

1

Implying the words that are searched for are capitalized:

import re

text = """
    ZYGOMA

    n. m. T. d'Anatomie . Os de la pommette de la joue.

    ZOMA

    n. m. T. d'Anatomie . Os de la pommette de la joue.

    ZYGMA

    n. m. T. d'Anatomie . Os de la pommette de la joue.

    A B C

    n. m. T. d'Anatomie . Os de la pommette de la joue.
"""

g = re.findall(r'([A-Z][A-Z ]*)\s+(?=n\. m|f)', text)
print(g)

Will print:

['ZYGOMA', 'ZOMA', 'ZYGMA', 'A B C']

For Unicode capitalized words the solutions is here: Python regex for unicode capitalized words

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91