how can I remove all characters after the second occurrence of a ' ' (space)

Question

My name regex has been proven faulty on a couple entries:

find_name = re.search(r'^[^\d]*', clean_content)

The above would output something like this on a few entries:

TERRI BROWSING APT A # current output

So, I need a way to trim that out; it's tripping the rest of my program. The only identifier I can think of is if I can somehow detect the second space; and remove all characters after it.

I only need the first and last name; i.e.

TERRI BROWSING # desired

After I remove those characters I could just .strip() out the trailing space, just need a way to remove all after second space.... or maybe detect only to get two words, nothing more.

Maybe you need to also validate the first two words that must be uppercase ASCII letters? `re.match("[A-Z]+\s+[A-Z]+", s)`? Otherwise, `\S` based regex does not seem necessary, you may as well use `split`. — Wiktor Stribiżew, Jul 29 '19 at 20:33

Carsten · Answer 1 · 2022-08-14T10:36:17.343

7

You don't even need regex since you can use simple splits and joins:

text = 'TERRI BROWSING APT A'
' '.join(text.split(' ')[0:2])

# 'TERRI BROWSING'

edited Aug 14 '22 at 10:36

answered Jul 29 '19 at 20:33

Carsten

2,765
1
13
28

I'm getting 'AttributeError: 're.Match' object has no attribute 'join'' with that. – Dr Upvote Jul 29 '19 at 20:38
This seems like you still try to use `re`. Try `find_name = ' '.join(text.split(' ')[0:2])` – Carsten Jul 29 '19 at 20:40

heemayl · Answer 2 · 2019-07-29T20:41:10.353

You can do:

^\S+\s+\S+

^ matches the start of the string
\S+ matches one or more non-whitespaces
\s+ matches one or more whitespaces

Also, assuming the whitespace is actually a space character, you can find the index of the second space using str.find and slice the string upto that point:

text[:text.find(' ', text.find(' ') + 1)]

Example:

In [326]: text = 'TERRI BROWSING APT A'                                                                                                                                                                     

In [327]: re.search(r'^\S+\s+\S+', text).group()                                                                                                                                                            
Out[327]: 'TERRI BROWSING'

In [338]: text[:text.find(' ', text.find(' ') + 1)]                                                                                                                                                         
Out[338]: 'TERRI BROWSING'

The fourth bird · Answer 3 · 2019-07-29T20:39:42.867

1

If you want to remove the rest, you could match 2 times a non whitespace char \S* followed by a space and capture that in a group. Then match any char 0+ times and replace with the first capturing group using re.sub

^(\S* \S* ).*

Regex demo | Python demo

import re

print(re.sub(r"^(\S* \S* ).*", r"\1", "TERRI BROWSING APT A"))

Result

TERRI BROWSING

edited Jul 29 '19 at 20:39

answered Jul 29 '19 at 20:33

The fourth bird

154,723
16
55
70

how can I remove all characters after the second occurrence of a ' ' (space)

3 Answers3

Linked