1

I'm trying to write a process that splits a string into tokens. At the moment it looks like this:

separators = ['(', ')', '+', '-', '*', '/', '=']

def tokenize(string):
    result_list = string.split()
    print result_list

print tokenize('((2 + 3) / (4 * 22))')

Which outputs this:

['((2', '+', '3)', '/', '(4', '*', '22))']

Which is pretty close, but I need the parentheses split out from the string (i.e., the output above it should read:

['(', '(', '2', '+', '3', ')', '/', '(', '4', '*', '22', ')', ')']

Any thoughts or help? Thanks!

t56k
  • 6,769
  • 9
  • 52
  • 115
  • 3
    `re.findall(r'\w+|[^\w\s]', s)` – Avinash Raj Jan 04 '16 at 05:51
  • 1
    That's not close in ay sense, I guess, you are just using `split()` with nothing as parameter, so it simply splits on the `" "`(whitespaces), To split the given string on something other than white space you must pass that string as parameter to the `split()`, or if you want to split on multiple strings then you should use `regex` – ZdaR Jan 04 '16 at 05:52
  • 1
    it can be done with regexp, but it's not the way to do it. I suggest you read http://ruslanspivak.com/lsbasi-part1/ tutorial by Ruslan Spivak, it shows good examples of how to achieve what you want. – pythad Jan 04 '16 at 05:53
  • I suggest reading this: http://stackoverflow.com/questions/1059559/python-split-strings-with-multiple-delimiters as it contains the answer that you need. – Sadia1990 Jan 04 '16 at 05:55
  • What exactly you are trying to do? We may be able to help you better. Are you trying to parse the string as small valid expressions? – thefourtheye Jan 04 '16 at 06:00
  • Yeah, I am. I made a mistake with the split method, I realise now, since the args I submitted to the process had whitespace. The article @pythad linked is helpful, and the answer below with `re` works perfectly. Would like to cross-check validity in its output against my separators list though. – t56k Jan 04 '16 at 06:02
  • @CD-RUM .. are you trying to evaluate that string into a valid expression?..if that's the case there are other easier ways to do it that using `split` or `regex` ? – Iron Fist Jan 04 '16 at 06:10

2 Answers2

2

You can simply do

import re
x="((2 + 3) / (4 * 22))"
print [i for i in re.split("(\W)",x) if i!=" " and i]

Output:['(', '(', '2', '+', '3', ')', '/', '(', '4', '*', '22', ')', ')']

or

x="((2 + 3) / (4 * 22))"
print [i for i in re.split("((?! )\W)| ",x) if i]
vks
  • 67,027
  • 10
  • 91
  • 124
1

You can use \S regex.

\S => match any non-white space character.

import re
p = re.compile(ur'[\S]')
test_str = u"((2 + 3) / (4 * 22))"

print re.findall(p, test_str)

Output - ['(', '(', '2', '+', '3', ')', '/', '(', '4', '*', '22', ')', ')']

iNikkz
  • 3,729
  • 5
  • 29
  • 59