split string based on special characters in python

Question

For example the string is hello %$ world %^& let me ^@ love && you the expected result would be hello in one variable and rest in other variables example a="hello" b="world" etc.

Look at `re.split`, split on any non-word sequences (or whatever your criterion is). And you certainly don't want to end up with separate *variables*, you'll want all those values in a list or dict. — deceze, Feb 14 '18 at 10:40
See https://stackoverflow.com/q/1373164/476 – *variable variables* are generally a bad idea. — deceze, Feb 14 '18 at 10:43

score 0 · Answer 1 · answered Feb 14 '18 at 10:41

0

Use regular expression

Like this:-

import re
a = "hello %$ world %^& let me ^@ love && you"
print(re.findall(r'\w+',a))

answered Feb 14 '18 at 10:41

Narendra

1,511
1
10
20

Andriy Ivaneyko · Answer 2 · 2018-02-14T10:52:12.420

0

You can user (regular expressions to retrieve worlds from the string):

import re
my_string = "hello %$ world %^& let me ^@ love && you"
re.findall(r'\w+\b', my_string)
# ['hello', 'world', 'let', 'me', 'love', 'you']

Please see more about regular expressions in Regular Expression HOWTO

Update

As asked in comments, attaching regexp to retrieve group of words separated by special characters:

my_string = "hello world #$$ i love you #$@^ welcome to world"
re.findall(r'(\w+[\s\w]*)\b', my_string)  
# ['hello world', 'i love you', 'welcome to world']

edited Feb 14 '18 at 10:52

answered Feb 14 '18 at 10:42

Andriy Ivaneyko

20,639
6
60
82

i know this but how to store string after a special character for example `hello world #$$ i love you #$@^ welcome to world` output must be a="hello world" b="i love you" c="welcome to world" – Vipul Rao Feb 14 '18 at 10:47
or what if to import a csv file which has something like this in a column and save each output to a particular column ! – Vipul Rao Feb 14 '18 at 10:56

S van Balen · Answer 3 · 2018-02-14T10:56:13.277

0

The basic answer would be a regexp. I would recommend looking in to tokenizer from NLTK, they encompas research on the topic and give you the flexibility to switch to something more sophisticated later on. Guess what? It offers a Regexp based tokenizer too!

from nltk.tokenize import RegexpTokenizer 

tokenizer = RegexpTokenizer(r'([A-Za-z0-9 ]+)')
corpus = tokenizer.tokenize("hello %$ world %^& let me ^@ love && you")

edited Feb 14 '18 at 10:56

answered Feb 14 '18 at 10:43

S van Balen

288
2
11

As per the update it would be : tokenizer = RegexpTokenizer(r'([A-Za-z0-9 ]+)') – S van Balen Feb 14 '18 at 10:55

split string based on special characters in python

3 Answers3

Update