1

So I am working with Regex inside JavaScript to get data matched from a plain text file with various list items starting with a dash "- ".

At first I used the following RegEx:

/(?<=- |\d\. ).*/g

But it turns out Positive Lookbacks aren't allowed to be used anymore in browsers so they currently only work with Chrome but no other browsers.

I then tried to get around this without using a lookback by using:

(\-\ |\d\.\ ).+

But this also selects the actual dash and first space which is also something I do not want as I need everything behind the first dash and space.

The info I have is formatted like:

- List item 1
- List item 2
- List item 3
- List item 4

And I require the output as "List item #" for every single row in the text file. Can someone perhaps guide me in the right direction to solve this or an alternative to the JavaScript .match() function?

Thanks in advance.

Rafaël De Jongh
  • 898
  • 2
  • 14
  • 32
  • Are you open to using capture groups? – Jacob Nov 06 '18 at 18:02
  • @Jacob yea sure as long as it is cross browser compatible. – Rafaël De Jongh Nov 06 '18 at 18:04
  • Is this list in a bigger text? – revo Nov 06 '18 at 18:14
  • What are you getting the input as? Do you have an array of strings where each element is one item from the list, or do you have one whole string? And what do you want as output? Do you want one whole string where each line is the part after the `-` or number? Or do you want an array containing them? – Sweeper Nov 06 '18 at 18:17
  • @Sweeper Looks like OP's satisfied with the lookbehind approach but since it is not supported globally they're looking for another solution. So I think it answers all your questions. – revo Nov 06 '18 at 18:20
  • Yes the list is in a bigger text with other components in the text file @revo. sweeper the text is a generic text (plain text/raw) where there's more than just a list item with either unsorted items or sorted items. But yes the output has to be the content after the dash and a space. – Rafaël De Jongh Nov 06 '18 at 19:09

4 Answers4

2

You can capture the substring after the - or number like this:

(?:- |\d\. )(.*)

Group 1 will contain the text you want.

var string = `- List item 1
- List item 2
- List item 3
- List item 4`
var regex = /(?:- |\d\. )(.*)/g
var match = null
while (match = regex.exec(string)) {
   console.log(match[1]); // match[1] is the string in group 1
}

Alternatively,

console.log(string.replace(regex, "$1"))

which will replace the whole match with group 1. This method is suitable if you want the output as one single string instead of an array of lists.

Sweeper
  • 213,210
  • 22
  • 193
  • 313
  • Thanks @sweeper this pretty much is what I was looking for as I was already in the right direction in terms of grouping but did not know how to actually get the matched group out of it! Much appreciated. – Rafaël De Jongh Nov 06 '18 at 19:28
1

Put it in a non-capturing group: (?:\-\ |\d\.\ )(.+).

Results look like this: https://regex101.com/r/OwZl8g/1

tyteen4a03
  • 1,812
  • 24
  • 45
  • While this does matches the grouping better, how would I return the grouped values? As if I put this into my .match() it will just return the whole string with the dash and space added to it, as in it will return the Full Match rather than the Grouped Match. – Rafaël De Jongh Nov 06 '18 at 18:08
  • 1
    See [this question](https://stackoverflow.com/questions/432493/how-do-you-access-the-matched-groups-in-a-javascript-regular-expression). – tyteen4a03 Nov 06 '18 at 18:11
1

If they are the first thing to match on the line, you could match from the start of the string 0+ times a space (or space and tabs in a character class), use and alternation to match either a dash or a digit and a dot. Then use a capturing group to capture what follows:

^ *(?:-|\d+\.) (.*)$

const strings = [
  '- List item 1',
  '  1. List item 2',
  '1. List item 3'
  
];
let pattern = /^ *(?:-|\d+\.) (.*)$/;
strings.forEach(s => {
  console.log(s.match(pattern)[1]);
});
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Besides the dash there are also the ordered list part which just counts from 1. to 10. That's also why I used the OR operator in the selection which isn't all too difficult to incorporate. However with this selector the whole thing (including the dash and space) is selected rather than everything that comes behind it like it did with the lookback operator. – Rafaël De Jongh Nov 06 '18 at 18:07
1

maybe you could give us an example of your input text.

const regexp = /(?:- )(\w)/g;
const input = '- a, some words - b';
const result = input.match(regexp); // result: ['- a','- b']

I highly recommend you use https://regexper.com to visualize your RegEx.

Hope these could help you.