0

I have a string of random characters between A and z, such as tahbAwgubsuregbbu and I can easily capture between {1,n} characters before a s with ([A-z]{1,8})s, but now I am trying to ignore a single (or multiple characters) before the s is matched. For example in the string above, I want to exclude any w characters. I understand I can't "jump" over the w with a capture group and return tahbAgub, but could I create two capture groups where the concatenation of those two groups is {1,n} characters, such as 1. tahbA 2. gub?

Regex101 Example

Vikrant Kashyap
  • 6,398
  • 3
  • 32
  • 52
hotshotiguana
  • 1,520
  • 2
  • 26
  • 40
  • cant you replace `w` by empty string and then use the captured group? – Rajshekar Reddy Apr 09 '16 at 18:26
  • 2
    `[A-z]` doesn't match what you think. For example it matches also `[ \ ] ^ _ ` *(see the ascii table)* – Casimir et Hippolyte Apr 09 '16 at 18:27
  • 1
    This? https://regex101.com/r/bL8bI1/1 – Shafizadeh Apr 09 '16 at 18:30
  • I'm not absolutely clear on what you want to achieve, but what about a `split` on `w` and then a `join` without separator (i.e. `''`)? – SamWhan Apr 09 '16 at 18:33
  • You could use a [non-capturing group](http://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group) to skip over the text you want to avoid. – litel Apr 09 '16 at 18:33
  • @Shafizadeh that does work, thanks, but is there any way to ensure each capture group is a max of `n` characters, say `8`? Sometimes the sample strings are >25,000 characters, so I just want a small subset. – hotshotiguana Apr 09 '16 at 18:35
  • First thing to do is split on `s`. Then loop the resulting array removing the characters you don't want, like replace(/[chars]+/, '') –  Apr 09 '16 at 20:01

1 Answers1

2

Try this:

/(.{0,8})?w(.{0,8})?s/

Online Demo


According to this comments, maybe you need to split your string. Something like this:

var str = "tftftfwtahbAwgubsuregbbu";
var res = str.split(/w|s/);
document.write(res);
Community
  • 1
  • 1
Shafizadeh
  • 9,960
  • 12
  • 52
  • 89
  • thanks! This works well; is there an easy way to tweak this so that it ignores an `w` character and creates `n` capture groups? For example `tftftfwtahbAwgubsuregbbu` would return three capture groups 1. `tftftf` 2. `tahbA` 3. `gub`? – hotshotiguana Apr 09 '16 at 18:39
  • @hotshotiguana I added another approach to my answer. – Shafizadeh Apr 09 '16 at 18:54