python regex to replace all single word characters in string

Question

I am trying to remove all the single characters in a string

input: "This is a big car and it has a spacious seats"

my output should be:

output: "This is big car and it has spacious seats"

Here I am using the expression

import re
re.compile('\b(?<=)[a-z](?=)\b')

This matches with first single character in the string ...

Any help would be appreciated ...thanks in Advance

The lookarounds look superfluous here since they match an empty pattern and always return true. Are you using the pattern with `re.sub`? `re.sub(r'\b[a-zA-Z]\b', '', s)` should work to some extent to remove *all* the single letter words — Wiktor Stribiżew, Feb 06 '17 at 11:13
withou regex `' '.join(word for word in strng.split(' ') if len(word) > 1)` — Chris_Rands, Feb 06 '17 at 11:14
@Chris_Rands: This will remove also single digit numbers, for example. — Wiktor Stribiżew, Feb 06 '17 at 11:15
@WiktorStribiżew Isn't that what OP wants? to remove "all the single characters" — Chris_Rands, Feb 06 '17 at 11:16
@Chris_Rands: Judging by the pattern used, only single letter whole words must be removed. — Wiktor Stribiżew, Feb 06 '17 at 11:17
@WiktorStribiżew It's unclear from the OP's question, of course `islpha()` or `isdigit()` etc. checks are possible — Chris_Rands, Feb 06 '17 at 11:20
@Chris_Rands: I think the current expression is "more or less" OK, just OP should provide the rest of the relevant code / context. — Wiktor Stribiżew, Feb 06 '17 at 11:21
Why are you trying to do this? I can understand for the "it has spacious seats", but "This is big car" isn't grammatically correct. — Sayse, Feb 06 '17 at 11:21
He mentions "single word characters" in the title, by that removal should be restricted to letters only. — shad0w_wa1k3r, Feb 06 '17 at 11:22

score 10 · Accepted Answer · edited May 23 '17 at 12:09

Edit: I have just seen that this was suggested in the comments first by Wiktor Stribiżew. Credit to him - I had not seen when this was posted.

You can also use re.sub() to automatically remove single characters (assuming you only want to remove alphabetical characters). The following will replace any occurrences of a single alphabetical character:

import re
input =  "This is a big car and it has a spacious seats"

output =  re.sub(r"\b[a-zA-Z]\b", "", input)

>>>
output = "This is  big car and it has  spacious seats"

You can learn more about inputting regex expression when replacing strings here: How to input a regex in string.replace?

@Ravi Just to repeat I did not see Wiktor Stribizew's comment when I wrote this. Glad we were able to help. — Chuck, Feb 06 '17 at 11:42

Moses Koledoye · Answer 2 · 2017-02-06T11:34:59.300

2

Here's one way to do it by splitting the string and filtering out single length letters using len and str.isalpha:

>>> s = "1 . This is a big car and it has a spacious seats"
>>> ' '.join(i for i in s.split() if not (i.isalpha() and len(i)==1))
'1 . This is big car and it has spacious seats'

edited Feb 06 '17 at 11:34

answered Feb 06 '17 at 11:24

Moses Koledoye

77,341
8
133
139

score 1 · Answer 3 · answered May 07 '19 at 16:42

1

re.sub(r' \w{1} |^\w{1} | \w{1}$', ' ', input)

answered May 07 '19 at 16:42

Gang

2,658
3
17
38

Tshilidzi Mudau · Answer 4 · 2017-02-06T11:30:25.677

0

EDIT:

You can use:

import re
input_string = "This is a big car and it has a spacious seats"
str_without_single_chars = re.sub(r'(?:^| )\w(?:$| )', ' ', input_string).strip()

or (which as was brought to my attention, doesn't meet the specifications):

input_string = "This is a big car and it has a spacious seats"
' '.join(w for w in input_string.split() if len(w)>3)

edited Feb 06 '17 at 11:30

answered Feb 06 '17 at 11:17

Tshilidzi Mudau

7,373
6
36
49

That's not fair to copy paste a comment as an answer. – Toto Feb 06 '17 at 11:18
... and does not meet the current specifications. – Wiktor Stribiżew Feb 06 '17 at 11:18
1

Read comments below the question, you will see. – Wiktor Stribiżew Feb 06 '17 at 11:19

rodeone2 · Answer 5 · 2017-04-23T11:41:06.183

The fastest way to remove words, characters, strings or anything between two known tags or two known characters in a string is by using a direct and Native C approach using RE along with a Common as shown below.

var = re.sub('<script>', '<!--', var)
var = re.sub('</script>', '-->', var)
#And finally
var = re.sub('<!--.*?-->', '', var)

It removes everything and works faster, better and cleaner than Beautiful Soup. Batch files are where the "" got there beginnings and were only borrowed for use with batch and html from native C". When using all Pythonic methods with regular expressions you have to realize that Python has not altered or changed much from all regular expressions used by Machine Language so why iterate many times when a single loop can find it all as one chunk in one iteration? Do the same individually with Characters also.

var = re.sub('\[', '<!--', var)
var = re.sub('\]', '-->', var)
And finally
var = re.sub('<!--.*?-->', '' var)# wipes it all out from between along with.

And you do not need Beautiful Soup. You can also scalp data using them if you understand how this works.

python regex to replace all single word characters in string

5 Answers5