For example the string is hello %$ world %^& let me ^@ love && you
the expected result would be hello in one variable and rest in other variables example a="hello" b="world" etc.
Asked
Active
Viewed 4,650 times
-3

Vipul Rao
- 1,495
- 2
- 10
- 15
-
Use regular expresson for these problems.. – Narendra Feb 14 '18 at 10:38
-
1Look at `re.split`, split on any non-word sequences (or whatever your criterion is). And you certainly don't want to end up with separate *variables*, you'll want all those values in a list or dict. – deceze Feb 14 '18 at 10:40
-
it should be in separate variables @deceze – Vipul Rao Feb 14 '18 at 10:42
-
1See https://stackoverflow.com/q/1373164/476 – *variable variables* are generally a bad idea. – deceze Feb 14 '18 at 10:43
3 Answers
0
Use regular expression
Like this:-
import re
a = "hello %$ world %^& let me ^@ love && you"
print(re.findall(r'\w+',a))

Narendra
- 1,511
- 1
- 10
- 20
0
You can user (regular expressions to retrieve worlds from the string):
import re
my_string = "hello %$ world %^& let me ^@ love && you"
re.findall(r'\w+\b', my_string)
# ['hello', 'world', 'let', 'me', 'love', 'you']
Please see more about regular expressions in Regular Expression HOWTO
Update
As asked in comments, attaching regexp to retrieve group of words separated by special characters:
my_string = "hello world #$$ i love you #$@^ welcome to world"
re.findall(r'(\w+[\s\w]*)\b', my_string)
# ['hello world', 'i love you', 'welcome to world']

Andriy Ivaneyko
- 20,639
- 6
- 60
- 82
-
i know this but how to store string after a special character for example `hello world #$$ i love you #$@^ welcome to world` output must be a="hello world" b="i love you" c="welcome to world" – Vipul Rao Feb 14 '18 at 10:47
-
or what if to import a csv file which has something like this in a column and save each output to a particular column ! – Vipul Rao Feb 14 '18 at 10:56
0
The basic answer would be a regexp. I would recommend looking in to tokenizer from NLTK, they encompas research on the topic and give you the flexibility to switch to something more sophisticated later on. Guess what? It offers a Regexp based tokenizer too!
from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer(r'([A-Za-z0-9 ]+)')
corpus = tokenizer.tokenize("hello %$ world %^& let me ^@ love && you")

S van Balen
- 288
- 2
- 11
-
As per the update it would be : tokenizer = RegexpTokenizer(r'([A-Za-z0-9 ]+)') – S van Balen Feb 14 '18 at 10:55