2

I need to know how to properly use "OR" when it comes to individual characters and whole phrases... For example I have code that is checking for any number of characters OR words that are found in an array...

I want to check for some unicode characters and also some html lines of code.

I'm currently just checking for the characters using this:

([\u200b\u200c\u200d\0\1\2\3\4\5\6\7]*)

(the backslashes are representing the unicode characters u+200b - u+200d and the special characters in my software \0-\7 (They are all individual characters), these are valid escape sequences in Objective-C.)

Now what if I wanted to check for these characters AND check for phrases like <b> or <font color="#FF0000">

I found stuff while doing research that said to use pipelines | but I'm not sure if I put them only in-between the words or also in-between the individual characters and I'm not sure if I put quotes around the words or what not... I need help before I screw this up badly haha!

(p.s., not sure if it will be any different but I'm also doing it for this:

([^\u200b\u200c\u200d\0\1\2\3\4\5\6\7])
Emil
  • 7,220
  • 17
  • 76
  • 135
Albert Renshaw
  • 17,282
  • 18
  • 107
  • 195

1 Answers1

1

it's be someting like

/([^....]|\<b\/\>|\<font color .... \>)/

though, the usual caveats about regexes and html apply here.

As for the confusion about where to put the |, consider this this hackneyed example: You want to find the word color, but also want to accommodate the british spelling, colour:

/(color|colour)/
/(colou?r)/
/(colo(r|ur))/

are all basically equivalent.

Community
  • 1
  • 1
Marc B
  • 356,200
  • 43
  • 426
  • 500
  • Thanks for the tip on the british spelling! Fortunately I am assured that it will always be the exact string `` (or 8 other specific variants) since it is generated with another piece of software – Albert Renshaw Feb 11 '13 at 18:40
  • What does this mean? `\` Does that become `` or ``? – Albert Renshaw Feb 11 '13 at 18:41
  • backslash escapes metachars. some regex engines use `<` and `>` for various things, and `/` is generally the regex delimiter, so I escape them all out of habit. – Marc B Feb 11 '13 at 18:42
  • I'm confused what the ellipses are in your code... do the dots just represent what I had typed I just need to copy in paste them in there or is that some type of code in RegEx (Sorry I'm new to RegEx and last time I removed some periods from example code it stopped working haha) – Albert Renshaw Feb 11 '13 at 18:45
  • Is this what I would do then? (Also do I have to backslash escape a pound sign or an equals sign?) ... `([^[\u200b\u200c\u200d\0\1\2\3\4\5\6\7]|\|\])` – Albert Renshaw Feb 11 '13 at 18:47
  • To check for anything but "u+200b" , "u+200c", etc. and ALSO anything but "``", "``", etc. – Albert Renshaw Feb 11 '13 at 18:48
  • yeah, the ellipses are what you typed. I'm lazy :) – Marc B Feb 11 '13 at 18:51
  • Haha! Okay great! And do you know if I have to escape the pound sign (#) or the Equal sign? – Albert Renshaw Feb 11 '13 at 18:52
  • normally, no, but depends on how you've set up your regexes. e.g. using `#` as the delimiter means you'd have to escape #'s anywhere inside the regex. – Marc B Feb 11 '13 at 18:54
  • Okay, good! so the code worked perfectly except sometimes when It's checking for anything but what I listed it grabs things 3 characters long or x characters long (If I have the `` tag which is 3 characters than it will grab the first set of 3 characters after that)... how can I make it check (^...) but only 1 character... part of me feels like you use a "?" or "{1}" or something (again I'm new to RegEx... it's my second day 0:) haha! – Albert Renshaw Feb 11 '13 at 19:02
  • In your code I think the carrot top (^) might be on the wrong spot... I'm getting a lot of errors now /: – Albert Renshaw Feb 11 '13 at 19:47
  • Found out what was wrong... I was using `[^...]` to check anything but a character... but since I know have phrases I had to use http://stackoverflow.com/questions/8854817/regex-match-words-except-these – Albert Renshaw Feb 11 '13 at 20:21