How to Split by Various Symbols Using Python?

Question

I have this simple file:

1|2|234 A=Jim 33
1|2|765 A=Sam 44
1|2|561 A=Edy 55

I want to parse the file to get the following output:

["1","2","Jim 33"]
["1","2","Sam 44"]
["1","2","Edy 55"]

I tried to split by "|", but the problem I am facing is how to split by "A=" or how to make the program recognizes "A=" and prints what is after it.

The algorithm that I have in mind is to iterate through each split item and check if the item contains the character "A=". Not sure how to translate that into python code. Any pythonic idea?

You could use a regex, like so: `(\d)\|(\d)\|(\d{3}) A=(.+)`, then get the groups. — Asad Saeeduddin, Dec 09 '15 at 03:48
Does each line always have the same length for each part? Or might there be a line with, say, `AC=Alice`? — TigerhawkT3, Dec 09 '15 at 03:50
Thank you for asking... The length of the lines are not consistent ... The sample of the file I put is simplistic ... Any idea how to tackle this issue? @TigerhawkT3 — MEhsan, Dec 09 '15 at 03:54

score 7 · Accepted Answer · answered Dec 09 '15 at 03:50

7

You can use regular expression, re.split:

>>> import re
>>> re.split('\|| A=', '1|2|234 A=Jim 33')
['1', '2', '234', 'Jim 33']

\|| A= will match | or A=. The first | was escaped because | has special meaning in regular expression (meaning OR).

answered Dec 09 '15 at 03:50

falsetru

357,413
63
732
636

How to Split by Various Symbols Using Python?

1 Answers1