-2

I have a large string value and I am trying to find the best way to replace a certain text value out of it without changing any values in a URL.

For instance lets say I want to replace the word "google" with "hello". I have a large string value with multiple instances of "hi" and a url within the string value "https://www.google.com" (this is just an example). Which is the best route to take for replacing these values, potentially a split on the string, regex or a replace?

At the moment I have something like this:

var data = "<h1>google this is a sample text</h1><p> more text will go here so, google. <a href='https://google.com'> Link here </a>";
var test = "";
if(data.Contains("google")){
   test = data.Replace("google", "hello");
}
// for case sensitivity
if(data.Contains("Google")){
   test = data.Replace("Google", "hello");
}

Is there a better alternative to this and would there be a way to not replace the text in a url?

jsg
  • 1,224
  • 7
  • 22
  • 44
  • regex would probably be able to do the trick here. – Seabizkit Apr 13 '21 at 10:06
  • 1
    How does your input string look like? Are there spaces, or any other separator in between "words" and URLs? – MindSwipe Apr 13 '21 at 10:06
  • 1
    could you give a example for the different kinds of input strings – fubo Apr 13 '21 at 10:07
  • Ill add more detail into the input string now but effectively it will look like html – jsg Apr 13 '21 at 10:08
  • @fubo updated here – jsg Apr 13 '21 at 10:13
  • 3
    Are you always going to be parsing HTML? Because if so, I recommend using something like [HtmlAgilityPack](https://html-agility-pack.net/) and using [this](https://stackoverflow.com/q/4182594/9363973) Q&A to get the text of the body, then use [this](https://stackoverflow.com/q/6275980/9363973) Q&A to replace the string you want, ignoring case – MindSwipe Apr 13 '21 at 10:18
  • For a totally generic string with urls, you probably have to split it in "generic substrings" and "urls", then perform the replace in those substrings that are not urls and finally reassemble. So the difficulty is to do the separation for a very generic string. Also, if your URL delimiters are well defined, you can create your own parser which might not be so difficult. – Amo Robb Apr 13 '21 at 10:27
  • If the string is formatted using – Amo Robb Apr 13 '21 at 10:32
  • @AmoRobb would be able to provide an example – jsg Apr 13 '21 at 11:02
  • Any solution other than HtmlAgilityPack (or some form of HTML parsing) is going to end in pain. Lots and lots of pain. – mjwills Apr 13 '21 at 12:41

1 Answers1

1

In your very particular case, I would try at first some kind of basic splitting, provided that the tag 'a' is always used and only used to insert the URLs

   private string ReplaceNonUrl_Split(string bigString, string[] substringsToReplace, string[] newStrings)
        {
            string[] Parts = bigString.Split(new string[] { "<a", "</a>" }, StringSplitOptions.None);

            for(int i=0; i<Parts.Length; i++)
            {
                if (Parts[i].Contains("href="))
                {
                    string[] subParts = Parts[i].Split(new string[] { ">" }, StringSplitOptions.None);
                    for (int j = 1; j < subParts.Length; j++)
                    {
                        for (int k = 0; k < newStrings.Length; k++)
                            subParts[j] = subParts[j].Replace(substringsToReplace[k], newStrings[k]);
                    }

                    Parts[i] = string.Join(">", subParts);
                }
                else
                {
                    for (int k = 0; k < newStrings.Length; k++)
                        Parts[i] = Parts[i].Replace(substringsToReplace[k], newStrings[k]);
                }
            }

            string ReplacedString= Parts[0];          
            bool startingURL = true;
            for(int i= 1; i< Parts.Length; i++)
            {
                if (startingURL)
                    ReplacedString += "<a" + Parts[i];
                else
                    ReplacedString += "</a>" + Parts[i];

                startingURL = !startingURL;
            }

            return ReplacedString;
        }

Then call:

   string replacedString = ReplaceNonUrl_Split(data, new string[] { "google", "Google" }, new string[] { "hello", "Hello" });

DISCLAIMER This is just a very manual option. Surely, there already exists libraries that do this for you nicer and efficiently, so I recommend to have a look first to existing html parsers that might fit you.

Amo Robb
  • 810
  • 5
  • 11