0

I have this situation, I have a sentence with wrong dot (.) to process, the sentence:

sentence = 'Hi. Long time no see .how are you ?can you follow .@abcde?'

I am trying to normalize this sentence, if you see it, there is some wrong format sentence (.how, ?can, and .@abcde). I am thinking of using regex to handle this because the sentence keep changing. This is my code so far: import re

character = ['.','?','@']

sentence = 'Hi. Long time no see .how are you ?can you follow .@abcde?'

sentence = str(sentence)
for i in character:
    charac = str(i)
    charac_after = re.findall(r'\\'+charac+r'\S*', sentence)
    if charac_after:
        print("Exist")
        sentence = sentence.replace(charac, charac+' ')

print(sentence)

The result some how skip the dot (.) and at (@) it just process the question mark (?). This is the result: Exist

Hi. Long time no see .how are you ? can you follow .@abcde?

its supposed to be "Hi. Long time no see . how are you ? can you follow . @ abcde?". I don't know if my double backslash in "r'\'+charac+r'\S*'" are wrong or something, did I miss something?

How can I process all the character? please help.

ytomo
  • 809
  • 1
  • 7
  • 23
  • 1
    I do not know python but you probably need to escape correctly see http://stackoverflow.com/questions/280435/escaping-regex-string-in-python – Fallenhero Feb 24 '17 at 09:28
  • If `r'\\'` is supposed to escape the next character following you only need to use a single backslash. At the moment you are escaping the backslash. However you should use `re.escape` instead. – Sebastian Proske Feb 24 '17 at 09:28
  • but i think your code is faulty anyway. you will add a space after every `.|?|@` with that – Fallenhero Feb 24 '17 at 09:29
  • @Fallenhero its my aim, add space after .|?|@, and i have tried the re.escape in r'\\'+re.escape(charac)+r'\S*'. Still have the same result. thank by the way – ytomo Feb 24 '17 at 09:47
  • @ytomo oh i thought you only wanted to add a space after those if there is none already. – Fallenhero Feb 24 '17 at 09:48
  • @SebastianProske I have tried r'\\'+re.escape(charac)+r'\S*', still have the same result. Did I miss something? – ytomo Feb 24 '17 at 09:49
  • @ytomo you will not need the `\\\` if you already escape it. – Fallenhero Feb 24 '17 at 09:50
  • @Fallenhero yes, add space after .|?|@ if there is none space after it (.how, ?can, and .@abcde) to (. how, ? can, and . @ abcde) – ytomo Feb 24 '17 at 09:52
  • @Fallenhero still need the double backslash, its error if it just '\', I am sorry if I get your idea the wrong way, still new in python. – ytomo Feb 24 '17 at 09:54
  • @ytomo dont use any `\\` there – Fallenhero Feb 24 '17 at 09:56
  • @ytomo `\S*` will match everything. You will need `\S+` or just `\S` – Fallenhero Feb 24 '17 at 09:57

1 Answers1

0

Without any knowlegde of python i think you need to do it like this:

(as per suggestion from @Sebastian Proske)

character = ['.','?','@']
sentence = str('Hi. Long time no see .how are you ?can you follow .@abcde?')
sentence = re.sub(r'([' + ''.join(map(re.escape, character)) + r'])(?=\S)', r'\1 ', sentence)
print(sentence)

The code i am not sure about, but the regex. see here: https://regex101.com/r/HXdeuK/2

see demo here https://repl.it/Fw5b/3

Fallenhero
  • 1,563
  • 1
  • 8
  • 17
  • hi, thanks before, I get new error: sre_constants.error: unexpected end of pattern – ytomo Feb 24 '17 at 10:09
  • will those characters change? or why dont you just use them directly in the regex, like i did on regex101 – Fallenhero Feb 24 '17 at 10:11
  • I got the backreference wrong, i updated it. Now it works perfectly – Fallenhero Feb 24 '17 at 10:23
  • Why so complicated? `print(re.sub(r'([' + ''.join(map(re.escape, character)) + r'])(\S)', r'\1 \2', sentence))` - I don't see a need for that for loop. – Sebastian Proske Feb 24 '17 at 10:30
  • sorry i dont know python and OP did it like this – Fallenhero Feb 24 '17 at 10:43
  • hi @SebastianProske thanks, good references. first line `jmap = '['+''.join(character)+']'` and second `print(re.sub(r'('+jmap+r')(\S)', r"\1 \2", sentence))` actually work. but it miss the @abcde?, result: `Hi. Long time no see . how are you ? can you follow . @abcde?` that the loop get it perfectly. thanks – ytomo Feb 24 '17 at 10:50
  • Oops, in `.@` the `@` is overlapping with the matched `\S`. Use a lookahed based solution then. `print(re.sub(r'([' + ''.join(map(re.escape, character)) + r'])(?=\S)', r'\1 ', sentence))` – Sebastian Proske Feb 24 '17 at 10:54