4

I am trying to match each \t in the leading whitespace of a line so I can replace them with two spaces. This is trivial with an unbounded (i.e., variable-length) lookbehind.

text.replace(/(?<=^\s*)\t/gm, '  ')

Unfortunately, this code is running on iOS, and for some reason, Safari and iOS have yet to implement lookbehinds, let alone unbounded lookbehinds.

I know there are workarounds for lookbehinds, but I can't seem to get the ones I've looked at to work.

I would rather not capture any characters aside from each tab, but if there's no other way, I could capture characters around the tabs in capture groups and add $1, etc, to my replacement string.

Example test code

const text = `
\t\ta
  \t  b
 \t  \t c\td  \te
`

const expected = `
    a
      b
        c\td  \te
`

// throws error in iOS, which does not support lookbehinds
// const regex = /(?<=^\s*)\t/gm;
const regex = /to-do/gm;

const result = text.replace(regex, '  ')

console.log(`Text: ${text}`)
console.log(`Expected: ${expected}`)
console.log(`Result: ${result}`)
console.log(JSON.stringify([ expected, result ], null, 2))

if (result === expected) {
  console.info('Success! ')
} else {
  console.error('Failed ')
}

Update

A less than ideal workaround would be to use two regexes and a replacer function.

const text = `
\t\ta
  \t  b
 \t  \t c\td  \te
`

const expected = `
    a
      b
        c\td  \te
`

const result = text.replace(/^\s*/gm, m => m.replace(/\t/g, '  '))

if (result === expected) {
  console.info('Success! ')
} else {
  console.error('Failed ')
}

Again, less than ideal. I'm a purist.

dx_over_dt
  • 13,240
  • 17
  • 54
  • 102
  • Since there is no look around in your platform, it can't be done with a single regex. Otherwise you wouldn't even need a variable length look behind. This is simple enough to work on all platforms [/(?:(?<!\[^\t\])|^)(\[^\S\r\n\t\]*)\t/gm](https://regex101.com/r/lFMD0w/1) replace with `$1 ` Except if your working on a real brezerk one. – sln Feb 14 '22 at 21:25
  • @sln There's lookahead, just no lookbehind. – dx_over_dt Feb 14 '22 at 21:26
  • But since the engine is free to advance to any position, there is no solution, right ? No single regex solution exists then. – sln Feb 14 '22 at 21:27

2 Answers2

1

You may use this Javascript solution without involving looknbehind:

const text = `
\t\ta
  \t  b
 \t  \t c\td  \te
`;

var repl = text.replace(/^[ \t]+/mg, g => g.replace(/\t/g, '  '));

console.log(repl);
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • So, exactly what I did in my update? ;) This question is more for my edification than needing an answer that works, so I'd like to do it in a single regex. – dx_over_dt Feb 14 '22 at 18:56
  • I was just reading the updated section. Sorry I started writing answer with the original question. It is unlikely that the a solution can be found without lookbehind and without a replacer lambda/function – anubhava Feb 14 '22 at 18:58
  • Unfortunately Javascript doesn't support `\G` otherwise in PCRE/Java it would be been `.replaceAll("(^ *|(?!^)\\G)\t", "$1. ");` – anubhava Feb 14 '22 at 19:04
  • No worries about starting your answer before reading my edit. I assumed that's what happened. :) – dx_over_dt Feb 14 '22 at 19:41
0

Here's a Ruby solution, should a reader wish to post a Javascript solution based on it.

rgx = /[a-z].*|\t/
str.gsub(rgx) { |s| s[0] == "\t" ? '  ' : s }

where str holds the string that is to be modified.

The regular expression is a two-part alternation:

[a-z]  # match a lower-case letter
.*     # match zero or more characters (to end of string)
|      # or
\t     # match a tab

Each match is passed to the "block" ({ |s| ...}) and is held by the block variable s. If the first character of the match is a tab two spaces are returned; else s is returned. If [a-z].* is matched there will be no further matches because the remainder of the string (possibly including tabs) will have been consumed.

In Python a lambda would by used in place of Ruby's block, something like

lambda m: '  ' if m.group()[0] == "\t" else m.group()
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100