1

I have a long string, which I have to manipulate in a specific way. The string can include other substrings which causes problems with my code. For that reason, before doing anything to the string, I replace all the substrings (anything introduced by " and ended with a non escaped ") with placeholders in the format: $0, $1, $2, ..., $n. I know for sure that the main string itself doesn't contain the character $ but one of the substrings (or more) could be for example "$0".

Now the problem: after manipulation/formatting the main string, I need to replace all the placeholders with their actual values again.

Conveniently I have them saved in this format:

// TypeScript
let substrings: { placeholderName: string; value: string }[];

But doing:

// JavaScript
let mainString1 = "main string $0 $1";
let mainString2 = "main string $0 $1";

let substrings = [
  { placeholderName: "$0", value: "test1 $1" },
  { placeholderName: "$1", value: "test2" }
];

for (const substr of substrings) {
  mainString1 = mainString1.replace(substr.placeholderName, substr.value);
  mainString2 = mainString2.replaceAll(substr.placeholderName, substr.value);
}

console.log(mainString1); // expected result: "main string test1 test2 $1"
console.log(mainString2); // expected result: "main string test1 test2 test2"

// wanted result: "main string test1 $1 test2"

is not an option since the substrings could include $x which would replace the wrong thing (by .replace() and by .replaceAll()).

Getting the substrings is archived with an regex, maybe a regex could help here too? Though I have no control about what is saved inside the substrings...

3 Answers3

3

If you're sure that all placeholders will follow the $x format, I'd go with the .replace() method with a callback:

const result = mainString1.replace(
  /\$\d+/g,
  placeholder => substrings.find(
    substring => substring.placeholderName === placeholder
  )?.value ?? placeholder
);

// result is "main string test1 $1 test2"
Robo Robok
  • 21,132
  • 17
  • 68
  • 126
0

This may not be the most efficient code. But here is the function I made with comments.

Note: be careful because if you put the same placeholder inside itself it will create an infinite loop. Ex:

{ placeholderName: "$1", value: "test2 $1" }

let mainString1 = "main string $0 $1";
let mainString2 = "main string $0 $1";

let substrings = [{
    placeholderName: "$0",
    value: "test1 $1"
  },
  {
    placeholderName: "$1",
    value: "test2"
  },
];

function replacePlaceHolders(mainString, substrings) {
  let replacedString = mainString

  //We will find every placeHolder, the followin line wil return and array with all of them. Ex: ['$1', $n']
  let placeholders = replacedString.match(/\$[0-9]*/gm)

  //while there is some place holder to replace
  while (placeholders !== null && placeholders.length > 0) {
    //We will iterate for each placeholder
    placeholders.forEach(placeholder => {
      //extrac the value to replace
      let value = substrings.filter(x => x.placeholderName === placeholder)[0].value
      //replace it
      replacedString = replacedString.replace(placeholder, value)
    })
    //and finally see if there is any new placeHolder inserted in the replace. If there is something the loop will start again.
    placeholders = replacedString.match(/\$[0-9]*/gm)
  }

  return replacedString
}

console.log(replacePlaceHolders(mainString1, substrings))
console.log(replacePlaceHolders(mainString2, substrings))

EDIT:

Ok... I think I understood your problem now... You did't want the placeHoldersLike strings inside your values to be replaced.

This version of code should work as expected and you won't have to worry aboy infine loops here. However, be carefull with your placeHolders, the "$" is a reserved caracter in regex and they are more that you should scape. I asume all your placeHolders will be like "$1", "$2", etc. If they are not, you should edit the regexPlaceholder function that wraps and scapes that caracter.

let mainString1 = "main string $0 $1";
let mainString2 = "main string $0 $1 $2";

let substrings = [
    { placeholderName: "$0", value: "$1 test1 $2 $1" },
    { placeholderName: "$1", value: "test2 $2" },
    { placeholderName: "$2", value: "test3" },

];

function replacePlaceHolders(mainString, substrings) {

    //You will need to escape the $ characters or maybe even others depending of how you made your placeholders
    function regexPlaceholder(p) {
        return new RegExp('\\' + p, "gm")
    }

    let replacedString = mainString
  //We will find every placeHolder, the followin line wil return and array with all of them. Ex: ['$1', $n']
    let placeholders = replacedString.match(/\$[0-9]*/gm)
    //if there is any placeHolder to replace
    if (placeholders !== null && placeholders.length > 0) {

        //we will declare some variable to check if the values had something inside that can be 
        //mistaken for a placeHolder. 
        //We will store how many of them have we changed and replace them back at the end
        let replacedplaceholdersInValues = []
        let indexofReplacedValue = 0

        placeholders.forEach(placeholder => {
            //extrac the value to replace
            let value = substrings.filter(x => x.placeholderName === placeholder)[0].value

            //find if the value had a posible placeholder inside
            let placeholdersInValues = value.match(/\$[0-9]*/gm)
            if (placeholdersInValues !== null && placeholdersInValues.length > 0) {
                placeholdersInValues.forEach(placeholdersInValue => {
                    //if there are, we will replace them with another mark, so our primary function wont change them
                    value = value.replace(regexPlaceholder(placeholdersInValue), "<markToReplace" + indexofReplacedValue + ">")
                    //and store every change to make a rollback later
                    replacedplaceholdersInValues.push({
                        placeholderName: placeholdersInValue,
                        value: "<markToReplace" + indexofReplacedValue + ">"
                    })

                })
                indexofReplacedValue++
            }
            //replace the actual placeholders
            replacedString = replacedString.replace(regexPlaceholder(placeholder), value)
        })

        //if there was some placeholderlike inside the values, we change them back to normal
        if (replacedplaceholdersInValues.length > 0) {
            replacedplaceholdersInValues.forEach(replaced => {
                replacedString = replacedString.replace(replaced.value, replaced.placeholderName)
            })
        }
    }

    return replacedString
}

console.log(replacePlaceHolders(mainString1, substrings))
console.log(replacePlaceHolders(mainString2, substrings))
Rubén Vega
  • 722
  • 6
  • 11
  • The code should return "main string test1 $1 test2" and not "main string test1 test2 test2" – FlorianStrobl Jul 17 '21 at 23:30
  • But didn't you want the $1 label to be replaced again by its value? maybe I misunderstood your problem. – Rubén Vega Jul 17 '21 at 23:35
  • I think I've solved what you wanted with a second version. The placeholders inside the "values" wont be replaced now – Rubén Vega Jul 18 '21 at 01:11
  • Yes it does work for the current string, but I have litterally no control over the substrings, and I know people who would try to break it, so having for example `""` inside a substring, would still break it... But there is probably no good solution to it... – FlorianStrobl Jul 18 '21 at 11:07
-1

The key is to choose a placeholder that is impossible in both the main string and the substring. My trick is to use non-printable characters as the placeholder. And my favorite is the NUL character (0x00) because most other people would not use it because C/C++ consider it to be end of string. Javascript however is robust enough to handle strings that contain NUL (encoded as unicode \0000):

let mainString1 = "main string \0-0 \0-1";
let mainString2 = "main string \0-0 \0-1";

let substrings = [
  { placeholderName: "\0-0", value: "test1 $1" },
  { placeholderName: "\0-1", value: "test2" }
];

The rest of your code does not need to change.

Note that I'm using the - character to prevent javascript from interpreting your numbers 0 and 1 as part of the octal \0.

If you have an aversion to \0 like most programmers then you can use any other non-printing characters like \1 (start of heading), 007 (the character that makes your terminal make a bell sound - also, James Bond) etc.

slebetman
  • 109,858
  • 19
  • 140
  • 171
  • 1
    The key is to implement a correct algorithm, not using the "impossible placeholders". Naive approach. – Robo Robok Jul 18 '21 at 01:43
  • @RoboRobok For text this would be a reasonably correct algorithm. Unless you plan on processing JPEG – slebetman Jul 18 '21 at 01:48
  • 1
    I believe in the "never trust your input" approach more than in anything. Not sure where does the OP's data come from, but in theory I can see the `\0` bytes being present in the input. Seriously, I always cringe when anyone assumes that something won't appear in the free input. – Robo Robok Jul 18 '21 at 01:53
  • I can't use `\1-0` because I use TypeScript (in strict mode ofc) and it does error with the message `octal escape sequences can't be used in untagged template literals or in strict mode code`. And `\0-0` would work correctly either (though I wouldn't use it anyway cause NULL) – FlorianStrobl Jul 18 '21 at 11:05
  • And yes a null byte could come as input. Anything which is Unicode infact. – FlorianStrobl Jul 18 '21 at 11:08
  • @slebetman can you take a look at my answer? Is it me, or is it as simple as that? – Robo Robok Jul 18 '21 at 11:47
  • @RoboRobok Yup. Doing it in a single pass should work. The OPs problem is that he was doing multiple passes. – slebetman Jul 18 '21 at 15:48
  • @RoboRobok I have used the `\0` substitution trick in real production software with no bugs in 20+ years. This includes a text editor that I use every day: https://github.com/slebetman/tcled. – slebetman Jul 18 '21 at 15:51
  • @FlorianStrobl This trick works with unicode because that is not a null byte. That is a NUL character which is a Unicode character and is the ONLY unicode character that has the code point `\0` which js interprets as `\u0000`. This is Unicode safe but not NUL character safe – slebetman Jul 18 '21 at 15:52
  • 1
    @slebetman sure thing, you solution is practically clever and in real life the chance of it causing problems is close to none. However, it can in theory get hacked if the input is raw and comes from the public. I probably am sometimes too strict when it comes to these things, I actually find your idea with `\0` an interesting invention. – Robo Robok Jul 18 '21 at 15:59