Regex to match/replace leading tabs without lookbehind

Question

I am trying to match each \t in the leading whitespace of a line so I can replace them with two spaces. This is trivial with an unbounded (i.e., variable-length) lookbehind.

text.replace(/(?<=^\s*)\t/gm, '  ')

Unfortunately, this code is running on iOS, and for some reason, Safari and iOS have yet to implement lookbehinds, let alone unbounded lookbehinds.

I know there are workarounds for lookbehinds, but I can't seem to get the ones I've looked at to work.

I would rather not capture any characters aside from each tab, but if there's no other way, I could capture characters around the tabs in capture groups and add $1, etc, to my replacement string.

Example test code

const text = `
\t\ta
  \t  b
 \t  \t c\td  \te
`

const expected = `
    a
      b
        c\td  \te
`

// throws error in iOS, which does not support lookbehinds
// const regex = /(?<=^\s*)\t/gm;
const regex = /to-do/gm;

const result = text.replace(regex, '  ')

console.log(`Text: ${text}`)
console.log(`Expected: ${expected}`)
console.log(`Result: ${result}`)
console.log(JSON.stringify([ expected, result ], null, 2))

if (result === expected) {
  console.info('Success! ')
} else {
  console.error('Failed ')
}

Update

A less than ideal workaround would be to use two regexes and a replacer function.

const text = `
\t\ta
  \t  b
 \t  \t c\td  \te
`

const expected = `
    a
      b
        c\td  \te
`

const result = text.replace(/^\s*/gm, m => m.replace(/\t/g, '  '))

if (result === expected) {
  console.info('Success! ')
} else {
  console.error('Failed ')
}

Again, less than ideal. I'm a purist.

Since there is no look around in your platform, it can't be done with a single regex. Otherwise you wouldn't even need a variable length look behind. This is simple enough to work on all platforms [/(?:(?<!\[^\t\])|^)(\[^\S\r\n\t\]*)\t/gm](https://regex101.com/r/lFMD0w/1) replace with `$1 ` Except if your working on a real brezerk one. — sln, Feb 14 '22 at 21:25
But since the engine is free to advance to any position, there is no solution, right ? No single regex solution exists then. — sln, Feb 14 '22 at 21:27

anubhava · Answer 1 · 2022-02-14T18:57:12.143

1

You may use this Javascript solution without involving looknbehind:

const text = `
\t\ta
  \t  b
 \t  \t c\td  \te
`;

var repl = text.replace(/^[ \t]+/mg, g => g.replace(/\t/g, '  '));

console.log(repl);

edited Feb 14 '22 at 18:57

answered Feb 14 '22 at 18:54

anubhava

761,203
64
569
643

So, exactly what I did in my update? ;) This question is more for my edification than needing an answer that works, so I'd like to do it in a single regex. – dx_over_dt Feb 14 '22 at 18:56
I was just reading the updated section. Sorry I started writing answer with the original question. It is unlikely that the a solution can be found without lookbehind and without a replacer lambda/function – anubhava Feb 14 '22 at 18:58
Unfortunately Javascript doesn't support `\G` otherwise in PCRE/Java it would be been `.replaceAll("(^ *|(?!^)\\G)\t", "$1. ");` – anubhava Feb 14 '22 at 19:04
No worries about starting your answer before reading my edit. I assumed that's what happened. :) – dx_over_dt Feb 14 '22 at 19:41

Cary Swoveland · Answer 2 · 2022-02-14T20:34:23.217

Here's a Ruby solution, should a reader wish to post a Javascript solution based on it.

rgx = /[a-z].*|\t/
str.gsub(rgx) { |s| s[0] == "\t" ? '  ' : s }

where str holds the string that is to be modified.

The regular expression is a two-part alternation:

[a-z]  # match a lower-case letter
.*     # match zero or more characters (to end of string)
|      # or
\t     # match a tab

Each match is passed to the "block" ({ |s| ...}) and is held by the block variable s. If the first character of the match is a tab two spaces are returned; else s is returned. If [a-z].* is matched there will be no further matches because the remainder of the string (possibly including tabs) will have been consumed.

In Python a lambda would by used in place of Ruby's block, something like

lambda m: '  ' if m.group()[0] == "\t" else m.group()

Pythonites: If the lambda I presented is incorrect I would be grateful for a correction in a comment. — Cary Swoveland, Feb 14 '22 at 20:32

Regex to match/replace leading tabs without lookbehind

Example test code

2 Answers2