1

I am very very very new to C# and ASP.NET development.

What I'd like to do is a find-and-replace for certain words appearing in the body text of a web page. Every time a certain word appears in the body text, I'd like to convert that word into a hyperlink that links to another page on our site.

I have no idea where to even start with this. I've found code for doing find-and-replace in C#, but I haven't found any help for just reading through a document, finding certain strings, and changing them into different strings.

EmilyM
  • 11
  • 1
  • 1
  • 2
  • so, you want to do this at runtime? – Muad'Dib Feb 28 '11 at 14:15
  • One question I have is where is the body text coming from? If it's from a database then you can put run it though simple extension method. If it's static text on the page you'll need a different plan. – mark123 Feb 28 '11 at 14:37
  • It's static. It's hard-coded into the page, there is no database behind the site. – EmilyM Feb 28 '11 at 14:41

5 Answers5

3

A couple of ways to accomplish this.

string text = "We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.";

string augmentedText = text.Replace("provide", "<a href='#provide'>provide</a>");

You could also use regular expressions to accomplish this.

Here's a sample that converts each word to upper case:

public static string MatchEval(Match m)
{
    return m.ToString().ToUpper();
}

static void Main(string[] args)
{
    string text = "This is some sample text.";

    Console.WriteLine(text);

    string result = Regex.Replace(text, @"\w+", new MatchEvaluator(MatchEval));

    Console.WriteLine(result);
}

Hope this helps...... Good luck!

Ian P
  • 12,840
  • 6
  • 48
  • 70
  • There is no real need to run a regular expression to simply replace a word or phrase with another. Seems to be a lot of overhead for no benefit over .Replace. The question isn't clear enough to decide though. +1 Preamble! :D – mark123 Feb 28 '11 at 14:34
  • Also, if there is a list of words to be replace it's a good idea to use Linq, particularly the .Aggregate() method. – mark123 Feb 28 '11 at 14:40
1

The best performing job to find words or text in a document is by using Regular Expressions. If you are new to these, I would most certainly recommend you to go through it if you're planning to make your project performant.

You might also want to search the internet for Wiki API's, which will help you build your solution, and you not having to reinvent warm water.

I'm pretty sure the following link will give you a head start to learning regular expressions. Download the expression tester and play with it a bit.

http://www.radsoftware.com.au/articles/regexlearnsyntax.aspx

Steven Ryssaert
  • 1,989
  • 15
  • 25
  • you're most welcome. I always think it's better to learn something yourself, than copy/pasting code like an empty headed monkey. – Steven Ryssaert Feb 28 '11 at 14:53
  • Well I don't want to copy-paste, but I am on a big time crunch and a tight deadline and I'm a PHP developer basically struggling to learn everything about .NET in under a week, so any more help you could give me would be amazing. – EmilyM Feb 28 '11 at 14:54
  • Continue to post your questions here and we'll see what we can do for you ;-) – Steven Ryssaert Feb 28 '11 at 15:03
0

It looks like what you want to do is make a c# application that opens the file and looks at the source code.

You probably need to use regular expressions to get the best matches for the text you want to replace, you also need to be carefull when writing this to make sure it only replaces whole words so for instance if you wanted to create a link for the word tom that takes you to toms page you wouldnt want this to create the link in the word tomorrow etc.

Basically I think the logic is find words with a space before and after and replace it with the code for the hyperlink.

Regex can be a bit daunting when you first look at it, but once you have your expressions it is a very powerfull way to perform this kind of thing.

Purplegoldfish
  • 5,268
  • 9
  • 39
  • 59
  • Thank you I am definitely going to look at regular expressions! I hadn't even thought of replacing parts of other words by mistake! – EmilyM Feb 28 '11 at 14:49
  • One other suggestion I might make is that if you are going to use this to update your files, make your application dump a copy of the original into another directory first just as a precaution. – Purplegoldfish Feb 28 '11 at 14:56
0

if you have some specific words, we generally use some special text like [NAME], [CLASS] to recognize the text, then do the following,

  1. Read the html , aspx file with textreader class.
  2. hold that entire text inside a string and start string .replace("[Name]",@"...") ... will be the required attributes,
  3. re-write the text to some new page with the same extension.
  • I would not recommend this solution as it will take processor time to compute the whole string, not to mention that holding 10 pagelengths of text inside a variable is very memory consuming. Also if it were required to replace a new word inside the text, the whole text would then again need to be searched, and written to another page. – Steven Ryssaert Feb 28 '11 at 14:33
  • Yes, it seems a bit memory-consuming. I should mention that each page will have many different keywords on it, so the process will have to happen again and again for each keyword - dozens of times per page for dozens of different words. Also, most of our pages are extremely lengthy and text-heavy. – EmilyM Feb 28 '11 at 14:38
  • Then regular expressions is the way to go. Also, do some looking up on Wiki's, as i mentioned above. – Steven Ryssaert Feb 28 '11 at 14:54
0

Okay Emily, to help you out on your short deadline a bit:

Read the following article on how to fetch the body html content in code-behind: http://west-wind.com/weblog/posts/481.aspx

Let's assume you have that Render() output stored in a variable named _pageContent

I am not using any Regular Expressions now, as I don't have the time to think of one properly. You can play around with that a bit yourself. The following link may point you in a direction: Regex to match multiple strings

public static void ChangeWordsToLinks()
{
  Dictionary<string, string> _wordLinkCollection = new Dicationary<string, string>();
  // fill the collection which will replace words by links here
  // Additionally you can fetch this from a database and loop 
  // through a DataTable to fill this collection
  _wordLinkCollection.add("foo", "http://www.foobar.com");
  _wordLinkCollection.add("bar", "http://www.barfoo.com");

  // this is lazy code and SHOULD be optimized to a single RegExp string.
  foreach(KayValuePair<string, string> pair in _wordLinkCollection)
  {
    _pageContent.Replace(String.Format(" {0} ", pair.Key), 
        String.Format("<a href='{0}'>{1}</a>", pair.Value, pair.Key));
  }
}

Glad if I could be of any help to you

Community
  • 1
  • 1
Steven Ryssaert
  • 1,989
  • 15
  • 25