0

I want to split the same word that start with letter م into two words , for ex معجبني split to ما عجبني how can i do that?? i m using python 2.7

# -*- coding: utf-8 -*-
token=u'معجبني'
if token[0]==u'م':
    token="i want her prosess to split the word into ما عجبني

the ouput that i want ما عجبني i hope any one help me

Taku
  • 31,927
  • 11
  • 74
  • 85
lina
  • 75
  • 2
  • 5
  • 1
    Do you mind giving an English example, since I believe this is the same for English – Taku Apr 08 '17 at 14:40
  • ok i mean for ex i have word x="nothing" if the word start with "no" then x=not thing (space between not and thing) i mean iit will be two word – lina Apr 08 '17 at 14:45

3 Answers3

0

You could use re.sub() to replace the desired character with a space and other characters.

The \\b word boundary makes sure that is the first character in the word. The word boundary doesn't work well with Python2.7 and UTF-8, so you could check if there's a space or string beginning before your character.

# -*- coding: utf-8 -*-
import re
token = u'ﻢﻌﺠﺒﻨﻳ'
#pattern = re.compile(u'\\bﻡ') # <- For Python3
pattern = re.compile(u'(\s|^)ﻡ') # <- For Python2.7
print(re.sub(pattern,u'ﻡﺍ ', token))

It outputs :

ما عجبني

The english equivalent would be :

import re
pattern = re.compile(r'\bno')
text = 'nothing something nothing anode'
print(re.sub(pattern,'not ', text))
# not thing something not thing anode

Note that it automatically checks every word in the text.

Community
  • 1
  • 1
Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
  • thank u but i have long text and i want to check each word if it start with م separate it and then make it as ما – lina Apr 08 '17 at 14:48
  • @lina: Weird. It works on my computer. If I post it here and copy-paste it back, it doesn't work anymore. – Eric Duminil Apr 08 '17 at 15:26
0

With str.startswith() to checks whether string starts with str, optionally restricting the matching with the given indices start and end. You can do this:

# -*- coding: utf-8 -*-
token=u'معجبني'
new_t = token.replace(u'م',u'ما ',1) if token.startswith(u'م') else token
print(new_t)
#ما عجبني
RaminNietzsche
  • 2,683
  • 1
  • 20
  • 34
-2

Use the split method.

x = ‘blue,red,green’
x.split(“,”)

[‘blue’, ‘red’, ‘green’]

Taken from http://www.pythonforbeginners.com/dictionary/python-split

EDIT: You can then join the array with " ".join(arr). Or you could replace the desire letter with itself and a space.

You example: nothing.replace("t", "t ") => "not thing"

Dweth
  • 7
  • 2