-1

I've a list of Chemical reactions and I want to split these reactions using a delimiter so that I end up with the species involved in the reaction. Is there any way out of this? For example:

H2 + O2 = 2H2O
Na2 + Cl2 = NaCl
Ag + Cl2 =  AgCl

I want to split the above reactions list in such a way that I end up with a list having the following [['H2', 'O2', '2H2O'],['Na2','Cl2','NaCl'],['Ag','Cl2','AgCl']]

RanRun
  • 27
  • 5
  • It depends on where you want to go with the result. Are you just looking for a quick dirty solution or a proper solution for parsing equations in a general sense? – Paul Rooney Oct 03 '19 at 11:34
  • 1
    Also: is the left out 2 in H2O in you expected reaction product result intentional, or is that a typo? So [H2, O2, H2O] or [H2, O2, 2H2O] – Kraay89 Oct 03 '19 at 11:35
  • Possible duplicate of [Split Strings into words with multiple word boundary delimiters](https://stackoverflow.com/questions/1059559/split-strings-into-words-with-multiple-word-boundary-delimiters) – Prathik Kini Oct 03 '19 at 11:46
  • See also https://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work – Eugene Yarmash Oct 18 '19 at 13:09

3 Answers3

1

You could do this with re.split(), splitting the string on one or more non-word characters:

>>> import re
>>> re.split(r'\W+', 'H2 + O2 = 2H2O')
['H2', 'O2', '2H2O']

Alternatively, you could use re.findall() to find all 'words':

>>> re.findall(r'\w+', 'H2 + O2 = 2H2O')
['H2', 'O2', '2H2O']

And if you want to strip leading numbers from the words, you can use a pattern like this:

>>> re.findall(r'\b\d*(\w+)', 'H2 + O2 = 2H2O')
['H2', 'O2', 'H2O']
Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378
0
import re
s = "H2 + O2 = 2H2O"
print (re.split("\W+", s))

# re.split takes a regular expression on which you can split the string.
# \W represents non-word character. For ASCII, word characters are [a-zA-Z0-9_]
# + represents one or more occurrences.

In your example, it splitted the string from ' + ' and ' = '

Bhawan
  • 2,441
  • 3
  • 22
  • 47
0

The str.split can't do this, so you can split your string in these ways:

First one is using re:

import re

re.split("+|=", "H2 + O2 = 2H2O")

Second is split manually:

mendeleev = []
cur = ""
for char in "H2 + O2 = 2H2O":
    if char in "+=":
        mendeleev.append(cur)
        cur = ""
    else:
        cur += char

Remember you should strip() your list elements (or do the str.replace(" ", "") first).