2

I'm writing a parser and I'd like to avoid chopping up the input string for performance reasons. Thus, I've created a Stream object that represents the string with a cursor:

const Stream = (string, cursor) => Object.freeze({
  string,
  cursor,
  length: string.length - cursor,
  slice: (start, end) => string.slice(start + cursor, end ? start + end : undefined),
  move: distance => stream(string, cursor + distance),
})

I want to be able to use regular expressions to match against this string. However, I don't care about anything in before the cursor. So suppose I have the following string and cursor:

> string = 'hello ABCD'
'ABCD'
> cursor = 6
6

So we don't care about anything before the A, but we want to be able to use regex to match all those uppercase letters:

> re = /^[A-Z]+/
/^[A-Z]+/

I'm not sure how to get this to work. I noticed when you use the g flag, then you can use RegExp.exec and it will keep track of a lastIndex property. But then the ^ match will not start at lastIndex...

Any ideas how I can get this to work efficiently? If I have to use a 3rd party regex library, I'm fine with that, but ideally this could be done with the native RegExp...

Chet
  • 18,421
  • 15
  • 69
  • 113

1 Answers1

-2

I would do with sed:

sed -rn 's/^.{'$cursor'}([A-Z]+)$/\1/p'

where $cursor is a shell variable containing the number of ignored chars at the beginning.

Option -r is extended regexp, -n is do not print always, p is print if match.

Now the question is how to port that to your language. Here you have some hints of how to use variables in regular expressions in Javascript.

Community
  • 1
  • 1
erik
  • 2,278
  • 1
  • 23
  • 30