8

I am trying to remove all the single characters in a string

input: "This is a big car and it has a spacious seats"

my output should be:

output: "This is big car and it has spacious seats"

Here I am using the expression

import re
re.compile('\b(?<=)[a-z](?=)\b')

This matches with first single character in the string ...

Any help would be appreciated ...thanks in Advance

Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
Ravi
  • 327
  • 1
  • 4
  • 14

5 Answers5

10

Edit: I have just seen that this was suggested in the comments first by Wiktor Stribiżew. Credit to him - I had not seen when this was posted.

You can also use re.sub() to automatically remove single characters (assuming you only want to remove alphabetical characters). The following will replace any occurrences of a single alphabetical character:

import re
input =  "This is a big car and it has a spacious seats"

output =  re.sub(r"\b[a-zA-Z]\b", "", input)

>>>
output = "This is  big car and it has  spacious seats"

You can learn more about inputting regex expression when replacing strings here: How to input a regex in string.replace?

Community
  • 1
  • 1
Chuck
  • 3,664
  • 7
  • 42
  • 76
  • @Ravi Just to repeat I did not see Wiktor Stribizew's comment when I wrote this. Glad we were able to help. – Chuck Feb 06 '17 at 11:42
2

Here's one way to do it by splitting the string and filtering out single length letters using len and str.isalpha:

>>> s = "1 . This is a big car and it has a spacious seats"
>>> ' '.join(i for i in s.split() if not (i.isalpha() and len(i)==1))
'1 . This is big car and it has spacious seats'
Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
1
re.sub(r' \w{1} |^\w{1} | \w{1}$', ' ', input)
Gang
  • 2,658
  • 3
  • 17
  • 38
0

EDIT:

You can use:

import re
input_string = "This is a big car and it has a spacious seats"
str_without_single_chars = re.sub(r'(?:^| )\w(?:$| )', ' ', input_string).strip()

or (which as was brought to my attention, doesn't meet the specifications):

input_string = "This is a big car and it has a spacious seats"
' '.join(w for w in input_string.split() if len(w)>3)
Tshilidzi Mudau
  • 7,373
  • 6
  • 36
  • 49
0

The fastest way to remove words, characters, strings or anything between two known tags or two known characters in a string is by using a direct and Native C approach using RE along with a Common as shown below.

var = re.sub('<script>', '<!--', var)
var = re.sub('</script>', '-->', var)
#And finally
var = re.sub('<!--.*?-->', '', var)

It removes everything and works faster, better and cleaner than Beautiful Soup. Batch files are where the "" got there beginnings and were only borrowed for use with batch and html from native C". When using all Pythonic methods with regular expressions you have to realize that Python has not altered or changed much from all regular expressions used by Machine Language so why iterate many times when a single loop can find it all as one chunk in one iteration? Do the same individually with Characters also.

var = re.sub('\[', '<!--', var)
var = re.sub('\]', '-->', var)
And finally
var = re.sub('<!--.*?-->', '' var)# wipes it all out from between along with.

And you do not need Beautiful Soup. You can also scalp data using them if you understand how this works.

rodeone2
  • 101
  • 1
  • 5