re.findall() function python

Question

Can you please help me to understand the following line of the code:

import re 
a= re.findall('[А-Яа-я-\s]+', string)

I am a bit confused with the pattern that has to be found in the string. Particularly, a string should start with A and end with any string in-between A and я, should be separated by - and space, but what does the second term Яа stand for?

`Яа` doesn't stand for anything. It's two ranges, `А-Я` and `а-я`. The first is the uppercase Cyrillic letters, the second is lowercase letters. — Barmar, Jan 06 '23 at 20:34
Why do you think it should start with `А`? That's inside a character set, so it's the start of a character range, not the start of the pattern. — Barmar, Jan 06 '23 at 20:35

score 2 · Accepted Answer · answered Jan 06 '23 at 20:35

[         ]      any of the characters in here
 А-Я             any character from А and Я, inclusive
    а-я          any character between а and я, inclusive
       -         the character -   (this is ambiguous; it should only be at the very start or end of the class)
        \s       any whitespace character
           +     at least one of the preceding class of characters

[А-Яа-я-\s]+     at least one character between А and Я (uppercase or lowercase), a dash, or whitespace

the [] is called a "class" in regex, and it's basically meant to say "any of the characters inside here is valid". And then + means "at least one occurrence of the preceding character/class". Python has a Regular Expressions HowTo that you might find useful to read through.

re.findall() function python

1 Answers1