2

I am processing files in order to replace a list of pre-defined keywords with a pre- and a post-string (say "#" and ".") like this :

"Word Word2 anotherWord and some other stuff" should become "#Word. #Word2. #anotherWord. and some other stuff"

My keys are unique and processed the keys from longest key to smallest, so I know inclusion can only be on already However, if I have key inclusion (e.g. Word2 contains Word), and if I do

"Word Word2 anotherWord and some other stuff"
    .Replace("anotherWord", "#anotherWord.")
    .Replace("Word2", "#Word2.")
    .Replace("Word", "#Word.")

I get the following result:

"#Word. ##Word.2. #another#Word.. and some other stuff"

For sure, my approach isn't wokring. So what is the way to make sure I only replace a key in the string, if it is NOT contained in another key? I tried RegExp but didn't find the correct way. Or there is another solution?

Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
neggenbe
  • 1,697
  • 2
  • 24
  • 62
  • 2
    Please pay close attention when selecting tags, so you don't select the wrong ones. – Some programmer dude May 30 '18 at 13:21
  • 1
    please tag your question with the correct language – Kami Kaze May 30 '18 at 13:21
  • That's not the way string replacement works. You may want to replace the search strings with another temporary string, then replace everything again after all searches are done. – Thomas Jager May 30 '18 at 13:30
  • Let's say you had `.Replace("bob", "#bob").Replace("cat", "#cat")` what would you expect to be the result if the input was `bobcat cat bob cabobt bocatb`? – mjwills May 30 '18 at 13:31
  • Maybe this post will help you [way-to-have-string-replace-only-hit-whole-words](https://stackoverflow.com/questions/6143642/way-to-have-string-replace-only-hit-whole-words) – Spongebrot May 30 '18 at 13:35
  • If you don't want to use keys that are contained in other keys, don't *have* keys that are contained in other keys. Filter the keys first to remove keys that are contained in other keys. – Scott Hannen May 30 '18 at 13:45
  • 1
    @ScottHannen: That's not a realistic solution. Doing half the work and claiming the rest shouldn't be done is not the same as doing the needed work. For the sample code, the "Word2" replace could effectively be removed because the "Word" replace will also hit "Word2", but this no longer works if different replacement values are used (e.g. replacing "Word" with "#Word" but replacing "Word2" with "@Word2"). – Flater May 30 '18 at 14:06
  • @Flater It's based on a sentence in a comment, which I took literally. If it was meant to be a solution I would have posted it as an answer. – Scott Hannen May 30 '18 at 17:24

6 Answers6

1

Just use Regular expressions with word boundary if performance is not a key requirement:

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

namespace Subst
{
    public class Program
    {
        public static void Main(string[] args)
        {
            var map = new Dictionary<string, string>{
                {"Word", "#Word."},
                {"anotherWord", "#anotherWord."},
                {"Word2", "#Word2."}
            };
            var input = "Word Word2 anotherWord and some other stuff";

            foreach(var mapping in map) {
                input = Regex.Replace(input, String.Format("\\b{0}\\b", mapping.Key), Regex.Escape(mapping.Value));
            }

            Console.WriteLine(input);
        }
    }
}
pocheptsov
  • 1,936
  • 20
  • 24
  • Close to my solution, which doesn't use regular expressions as, I must be honest, I simply don't know enough about them...! – neggenbe May 31 '18 at 14:28
0

One way is to use

string myString = String.Format("ORIGINAL TEXT {1} {2}", "TEXT TO PUT INSIDE CURLY BRACKET 1", "TEXT TO PUT IN CURLY BRACKET 2");

//Result: "ORIGINAL TEXT TEXT TO PUT INSIDE CURLY BRACKET 1 TEXT TO PUT IN CURLY BRACKET 2"

However, this requires your original text to have the curly brackets inside in the first place.

Quite messy, but you could always replace the words you are looking for with the Replace and then change the curly backets at the same time. There is probably a far better way of doing this but I cant think of it right now.

Christopher Vickers
  • 1,773
  • 1
  • 14
  • 18
  • 1
    (1) While technically possible, preformatting input values with `String.Format` notation is effectively tying the input to a particular implementation, which is not good design. (2) It also implies that whoever types the input sentences knows all the placeholders and the order of the placeholder list, which is expecting a lot of knowledge from the end user, and often defeats the purpose of an automated task such as this. (3) Also, your code doesn't even work, since String.Format is **zero indexed.** – Flater May 30 '18 at 13:37
  • It depends that the original user needs this to do. He might be trying to populate an existing template, in which case this would be perfect. As for saying my code doesn't work. Updated for YOU :p – Christopher Vickers May 30 '18 at 13:43
0

I suggest direct implementation, e.g.

private static String MyReplace(string value, params Tuple<string, string>[] substitutes) {
  if (string.IsNullOrEmpty(value))
    return value;
  else if (null == substitutes || !substitutes.Any())
    return value;

  int start = 0;
  StringBuilder sb = new StringBuilder();

  while (true) {
    int at = -1;
    Tuple<string, string> best = null;

    foreach (var pair in substitutes) {
      int index = value.IndexOf(pair.Item1, start);

      if (index >= 0)  
        if (best == null || 
            index < at || 
            index == at && best.Item1.Length < pair.Item1.Length) { 
          at = index;
          best = pair;
        }
    }

    if (best == null) {
      sb.Append(value.Substring(start));

      break;
    }

    sb.Append(value.Substring(start, at - start));
    sb.Append(best.Item2);
    start = best.Item1.Length + at;
  }

  return sb.ToString();
}

Test

  string source = "Word Word2 anotherWord and some other stuff";

  var result = MyReplace(source, 
    new Tuple<string, string>("anotherWord", "#anotherWord."),
    new Tuple<string, string>("Word2", "#Word2."),
    new Tuple<string, string>("Word", "#Word."));

 Console.WriteLine(result);

Outcome:

 #Word. #Word2. #anotherWord. and some other stuff
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
0

Regex alternative (order doesn't matter):

var result = Regex.Replace("Word Word2 anotherWord and some other stuff", @"\b\S+\b", m => 
    m.Value == "anotherWord" ? "#anotherWord." : 
    m.Value == "Word2" ? "#Word2." :
    m.Value == "Word" ? "#Word." : m.Value)

Or separate:

string s = "Word Word2 anotherWord and some other stuff";

s = Regex.Replace(s, @"\b" + Regex.Escape("anotherWord") + @"\b", "#anotherWord.");
s = Regex.Replace(s, @"\b" + Regex.Escape("Word2")       + @"\b", "#Word2.");
s = Regex.Replace(s, @"\b" + Regex.Escape("Word")        + @"\b", "#Word.");
Slai
  • 22,144
  • 5
  • 45
  • 53
0

Solved the problem using a two-loop-through approach as follows...

List<string> keys = new List<string>();
keys.Add("Word1"); // ... and so on
// IMPORTANT: algorithm works only when we are sure that one key cannot be
//            included in another key with higher index. Also, uniqueness is
//            guaranteed by construction, although the routine would work
//            duplicate key...!
keys = keys.OrderByDescending(x => x.Length).ThenBy(x => x).ToList<string>();
// first loop: replace with some UNIQUE key hash in text
foreach(string key in keys) {
  txt.Replace(key, string.Format("!#someUniqueKeyNotInKeysAndNotInTXT_{0}_#!", keys.IndexOf(key)));
}
// second loop: replace UNIQUE key hash with corresponding values...
foreach(string key in keys) {
  txt.Replace(string.Format("!#someUniqueKeyNotInKeysAndNotInTXT_{0}_#!", keys.IndexOf(key)), string.Format("{0}{1}{2}", preStr, key, postStr));
}
neggenbe
  • 1,697
  • 2
  • 24
  • 62
-1

You can split your string by ' ' and cycle through the string array. Compare each index of the array to your replacement strings and then concatenate them when finished.

string newString = "Word Word2 anotherWord and some other stuff";
string[] split = newString.Split(' ');

foreach (var s in split){
    if(s == "Word"){
        s = "#Word";
    } else if(s == "Word2"){
        s = "#Word2";
    } else if(s == "anotherWord"){
        s = "#anotherWord";
    }
}
string finalString = string.Concat(split);
  • Note that spaces are likely not the only delimiters. Commas, periods, semicolons, quotation marks, ... Your answer works from a technical perspective but might not cover every needed case. – Flater May 30 '18 at 13:44
  • Nope as key could actually also be multi-words (agreed, forgot to mention it!) – neggenbe May 31 '18 at 14:24