how to find all usernames in a large text knowing the user names are after or before specific phrases in python?

Question

So i have a large text file that looks like this :

""" Yay you made it, User1 ! — 25/03/2022 --------------- User2 joined the party. — 22/03/2022 --------------- Yay you made it, User3 ! — 29/03/2022 --------------- User4 joined the party. — 28/03/2022"""

How do i get all the names of the users, knowing they are all after or before those specific phrases with python ?

I tried :

import re
text =""" ....""" #text is here
before_j = re.findall(r'\bjust showed up\S*', text)
print(before_j)

score 3 · Answer 1 · answered Apr 01 '22 at 21:37

Use

(?<=Yay you made it, )\S+|\S+(?= joined the party)

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    Yay you made it,         'Yay you made it, '
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  \S+                      non-whitespace (all but \n, \r, \t, \f,
                           and " ") (1 or more times (matching the
                           most amount possible))
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  \S+                      non-whitespace (all but \n, \r, \t, \f,
                           and " ") (1 or more times (matching the
                           most amount possible))
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
     joined the party        ' joined the party'
--------------------------------------------------------------------------------
  )                        end of look-ahead

The commenting / formatting is really neat here. Did you write that all yourself or is there a program that does the annotation for you? Nice answer! — David542, Apr 01 '22 at 22:26

cards · Answer 2 · 2022-04-01T23:05:42.923

I settle two matching rules for the names:

it, (name_pattern) ! "it," then name followed by " !"
-{3,} (name_pattern)\s at least 3- characters followed by the name and an empty character where name is any sequence of alphabetic character terminating with one or more digits, ([a-zA-Z]+\d+)

The pattern-matching is done simultaneously and needs to remove the "empty" match in the loop.

import re

text = """ Yay you made it, User1 ! — 25/03/2022 --------------- User2 joined the party. — 22/03/2022 --------------- Yay you made it, User3 ! — 29/03/2022 --------------- User4 joined the party. — 28/03/2022"""

# list of rules
rules = (r'it, ([a-zA-Z\d]+) !', r'-{3,} ([a-zA-Z]+\d+)\s')

#
regex = '|'.join(rules)

matches = [g1 if g2 == '' else g2 for g1, g2 in re.findall(regex, text)]

print(matches)

Output

['User1', 'User2', 'User3', 'User4']

EDIT To avoid filtering the empty strings of the matched text one can use symbolic grouping (just groups with ids):

# symbolic grouping
rules = (r'it, (?=<g1>[a-zA-Z\d]+) !', r'-{3,} (?=<g2>[a-zA-Z]+\d+)\s')

regex = '|'.join(rules)

matches = [g.lastgroup for g in re.finditer(regex, text)]

score 1 · Answer 3 · answered Apr 01 '22 at 22:07

If we start with your input text:

Yay you made it, User1 ! — 25/03/2022 --------------- User2 joined the party. — 22/03/2022 --------------- Yay you made it, User3 ! — 29/03/2022 --------------- User4 joined the party. — 28/03/2022

We can simplify the regex to (User\d+) if the username is always of the form User[one or more numbers].

However, I would assume that the username might be more complex, and so let's just pretend that a username is one or more non-space characters (notice, this is often not valid -- what if there is a period or exclamation point at the end -- User1!? -- in which case \w would be a better specifier). In which case, we want to match a username preceded by the words "You made it, " or succeeded by the words "joined the party". In which case we have:

(?<=you made it, )(\S+)|(\S+)(?= joined the party)

import re
s = "Yay you made it, User1 ! — 25/03/2022 --------------- User2 joined the party. — 22/03/2022 --------------- Yay you made it, User3 ! — 29/03/2022 --------------- User4 joined the party. — 28/03/2022"
[item[0] or item[1] for item in re.findall(r'(?<=you made it, )(\S+)|(\S+)(?= joined the party)', s)]
# ['User1', 'User2', 'User3', 'User4']

gremur · Answer 4 · 2022-04-02T16:55:23.270

0

Possible solution is the following:

PROS: "User" name may have any characters except space.

import re

string = """ Yay you made it, User1 ! — 25/03/2022 --------------- User2 joined the party. — 22/03/2022 --------------- Yay you made it, User3 ! — 29/03/2022 --------------- User4 joined the party. — 28/03/2022"""

found = re.findall(r',\s(\S+)\s!|-\s(\S+)\sj', string, re.I)

print(list(filter(None, [item for t in found for item in t])))

Prints

['User1', 'User2', 'User3', 'User4']

REGEX DEMO

Thanks to @cards, @David542 for valuable comments.

edited Apr 02 '22 at 16:55

answered Apr 01 '22 at 22:22

gremur

1,645
2
7
20

1

Hi! ...just a curiosity, but all those non-capturing group are really needed? What is their purpose? – cards Apr 01 '22 at 22:34
1

@cards no they're not, both `,\s(\S+)\s!|-\s(\S+)\sj` and `(?:,\s(\S+)\s!)|(?:-\s(\S+)\sj)` return the exact same four groups. Though it might help a bit with readability with the alternation character -- it actually makes it more readable to me when I'm trying to figure out what the regex is doing. – David542 Apr 01 '22 at 22:50
@David542 thanks! For readability & debugging I agree, I am not used to it...that's all. I think they can make a bit of confusion with the idea of nested groups (which is not supported) – cards Apr 01 '22 at 22:56

how to find all usernames in a large text knowing the user names are after or before specific phrases in python?

4 Answers4