-2

I have the following string:

\" A B 10\\n\”

In this case, I just want to match the first two letters and the number using a Regex.

I would like to have the matched items on an Array like this:

=> [‘A’, ‘B’, ’10’]

That said, the letters A, B and the number 10 are all variables, which means, it could be B C 22 for example.

The solution that I imagine is something like this:

x = \" A B 10\\n\”.match(regex)

So I can split it to have the desired results.

x[0] evaluates to A

x[1] evaluates to B

x[2] evaluates to 10

halfer
  • 19,824
  • 17
  • 99
  • 186
Kleber S.
  • 8,110
  • 6
  • 43
  • 69
  • 1
    And what is the expected result? take a look here perhaps: http://stackoverflow.com/questions/80357/match-all-occurrences-of-a-regex – Casimir et Hippolyte Sep 02 '14 at 03:42
  • I want to match only `A`, `B` and `10`. – Kleber S. Sep 02 '14 at 03:43
  • @CasimiretHippolyte Thank you for the link. I'm almost there! – Kleber S. Sep 02 '14 at 03:46
  • Your question is extremely vague. If you know the string contains `A`, `B`, and `"10"`, what is there to parse? Or perhaps the string may contain only some (or none) of those substrings. Or are you looking for subsrings that match `/\w+/` before `"\\"`, or ones that match `/[a-zA-Z]+/` or `/\d+/`? Do you not want to match `"n"` or not match anything after `"\\"`? Say what you want to do without reference to to a specific string. Give more examples and show the desired output for each. You need to clarify for those who have read your question or also for those who will read it in future. – Cary Swoveland Sep 02 '14 at 05:22
  • Much better. btw, I did not downvote or vote to close. – Cary Swoveland Sep 02 '14 at 06:41

3 Answers3

2

Solved for Your Corpus Using Alternation

This uses alternation to capture just the specific items you say you want.

str = %q{\" A B 10\\n\"}
str.scan /A|B|10/
#=> ["A", "B", "10"]

More Generic Solution

This solution looks for single capital letters at word boundaries, or strings of digits. It's a more generic solution that doesn't require you pre-specifying exactly which letters or numbers you want to match. It works for your corpus, but you may need to tweek it if your real corpus is more complex.

str = %q{\" A B 10\\n\"}
str.scan /\b\p{Upper}\b|\d+/
#=> ["A", "B", "10"]
Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
  • I think (but I'm not sure however) that writing `[AB]` instead of `A|B` is faster. – Casimir et Hippolyte Sep 02 '14 at 04:42
  • 1
    @CasimiretHippolyte Benchmark it. But it doesn't matter on a corpus this small. Semantic clarity is often more important anyway, and alternation makes it clear that you're looking for one of three *specific* values. – Todd A. Jacobs Sep 02 '14 at 04:46
  • Don't make a miscarriage, it's just an observation. – Casimir et Hippolyte Sep 02 '14 at 04:51
  • @CasimiretHippolyte It's called "bike shedding." The difference on this corpus is less than 2 microseconds. If you can solve world hunger in 2 millionths of a second, then please ensure you use a character class instead of alternation so that invaluable time isn't wasted. – Todd A. Jacobs Sep 02 '14 at 04:58
  • I know that it is not an important gain, I only want to notice it. Is it a problem? – Casimir et Hippolyte Sep 02 '14 at 05:02
2

Use a negative lookbehind to not to match a word character which was present just after to the \ symbol.

> "\" A B 10\\n\s\t\"".scan(/(?<!\\)\w+/)
=> ["A", "B", "10"]
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0
(?<!\\\\)([a-zA-Z\d]+)

Use this.

See demo.

vks
  • 67,027
  • 10
  • 91
  • 124