1

I would like to split a string based on a delimiter and ignore a particular pattern. I have lines in a text file that look like so

 "ABC | 0 | 567 | my name is | however
  TQD | 0 | 567 | my name is | but
  GED | 0 | 567 | my name is | haha"""

I would like to split on "|" but ignore 0 and 567 and grab the rest. i.e

['ABC', 'my name is', 'however']
['TQD', 'my name is', 'but']
['GED', 'my name is', 'haha']

whenever I split, its grabbing the two numbers as well. now numbers can occur in other places, but this particular pattern of |0|567| needs to be ignored. I can obviously split on "|" and pop the element at index 1 and 2. but looking for a better way.

I tried this:

import re
pattern = re.compile(r'\|(?!0|567)')
pattern.split(line) 

this yields [ABC|0|567, my name is, however]

RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
turtle_in_mind
  • 986
  • 1
  • 18
  • 36
  • Do you want to split on any digit between? See if [this demo](https://tio.run/##LU3LCsIwELznK5ZemkQpgqhQENH@hpeiC13Ii@2iFPrvcQse5gUzTFlkyulYK8WSWYDRmEAJ4QrN/THACgfF6XxRjgukMSLQrGHKX/wgN8aUUQQ56YKxe@VYKKDlFvxztbde5b3bvPPgW6d1piRWq3MJJPa/3sP26lytPw) helps. – bobble bubble Nov 12 '22 at 03:51
  • no i want to keep other numbers that occur in string as it is. so my string is like this: ""ABC|0|567|my name is| however | 222 | 1.000 | etc. – turtle_in_mind Nov 12 '22 at 03:58
  • i want to keep the other numbers but just ignore |0|567 @bobble bubble so i would like ["ABC", "my name is", "222", 1.000, etc"] the numbers 0 and 567 occur at the same place and no where else. – turtle_in_mind Nov 12 '22 at 03:59
  • Maybe [this updated demo](https://tio.run/##LY7LCsJADEX38xWhm84MItWiBUFE/Q03RQINdB6kQSnMv48puLgHEjiXm1eZUuxrpZATCzAaM1NEuEJzfzyhQKc5nQdlWCGOAYEWPQ7HXjmlL36QG2PyKIIc1WPcv1PINKPlFvyr2NsFvKIr2uO2j/PgW6cSUxSrwpJnEvvv2MG2wLlafw) works for you. – bobble bubble Nov 12 '22 at 04:02
  • yep that works, thank you. can you post this as answer and i will mark it as accepted – turtle_in_mind Nov 12 '22 at 04:04

1 Answers1

3

To include the | specific numbers | in the split sequence:

pattern = re.compile(r' *\|(?: *(?:0|567) *\|)* *')

See this demo at regex101 or a Python demo at tio.run


The (?: non capturing groups ) is repeated * any amount of times.

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • hello, i have another question. I want to ignore the string "US" in there on the first occurance but keep other "US" i.e ABC | 0 | 567 | my name is | however|US|but|no|well|US and the result is [ABC, my name is, however, but, no, well, US] how do i do that using this? – turtle_in_mind Nov 12 '22 at 20:58
  • 1
    @turtle_in_mind Probably easier to remove the first `US` from the string before splitting using `re.sub`: [`line = re.sub(r"(\||^) *US *\| *", r"\1", line, 1);`](https://tio.run/##LY7NasQwDITvfgrhHtY2S9mltIWWUto@QsgtFJKgsAb/ITu7LPjdU5n0oBEjxDeT7uUSw9O2WZ8iFSAUwtmA8AHy6/sHKpx4nl9eWf0dwugRbGZziTe8ItW@q9Naaoj1hs6xlUI8MMbHK8JiKReI87wSYZgR4gJ9twdwAuFjXidFUg21/mowfQdmqGDkEUgOZ17t9Qhn/d6oOTlbYMww4RK5aRpLQQo7aY4@WYeKDo2hPt/AsJwqd9ftog2YgxaJbCiqJTeY@kfsQVpv2x8) – bobble bubble Nov 12 '22 at 21:58