1

But I need to write a function that takes in a string, looks for a URL in a hyperlink and then swaps the url around so that the page name is now used as an anchor so for example

<a href="mysection/mysector/apage.aspx">

would become

<a href="mysection/mysector.aspx#apage">

but this would only happen for links in the mysector folder.

I am a bit stumped at the moment and any help would be great.

Brandon
  • 68,708
  • 30
  • 194
  • 223
Code Pharaoh
  • 3,044
  • 2
  • 22
  • 26

3 Answers3

1

This will eat all sequences of "folder/" and catch the last of them. This gets appended an ".aspx" and "#" and the filename without extension. Character classes may need further adjustments if your folder and file names can contain not only alphanumeric characters.

href="(([a-z0-9]+/)*)([a-z0-9]+)/([^.]+)\.aspx"

then replace with

href="$1$3.aspx#$4"

Also try "mysection/anothersection/yetanotherone/mysector/apage.aspx" to understand how it works.

Leif
  • 2,143
  • 2
  • 15
  • 26
  • Hey Leif, thanks for your explanation but i think i need a bit more help getting it to work. When i try it in http://regexpal.com/ I use your regex with the test string _some text /mysector/page.aspx sadsadasdasd_ but it doesnt seem to match anything. – Code Pharaoh Jun 22 '11 at 15:09
  • On regexpal.com you must make sure that your regex does not have any trailing whitespace (newlines etc.). You could also try my regex with the function Tremmors used. I adjusted it a bit so it is more specific and should work in C# if you escape it. You can also check my regex on http://gskinner.com/RegExr/, it is way better. It works (choose the "replace" tab). – Leif Jun 22 '11 at 15:14
  • Ah, the green checkmark tells me it worked. :) I hope you are happy with the solution. – Leif Jun 22 '11 at 16:21
  • Hey Leif, Yeah it worked great thanks. And i think i understand regex a bit better, double win. – Code Pharaoh Jun 23 '11 at 08:13
1

I'm going to suggest using the IIS URLRewrite module to fix it on the back end. Then you won't need any code.

If you really want to do this in c#:

public string FixLinks(string strHTML)
{
    try
    {
        return Regex.Replace(strHTML, "(href=\\\".*/mysector)/(.*)", "$1.aspx#$2");
    }
    catch (Exception e)
    {
        return strHTML;
    }
}
Tremmors
  • 2,906
  • 17
  • 13
  • This would only work with "mysector" as the last folder. Furthermore I think it actually would'nt work. Did you really try this out? – Leif Jun 22 '11 at 14:29
  • The OP notes that he only wants links in the mysector folder. Granted, this does not take into account any subdirectories in that folder, but he did not specify that.I quickly tested it and it does seem to work. Does it not work for you? – Tremmors Jun 22 '11 at 14:58
0

In no particular order:

This will help you when you're testing your regexes: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

Take a look at the Matches collection this is where the parts of the string will be kept.

An example: http://forums.asp.net/t/1408417.aspx/1

A warning: RegEx match open tags except XHTML self-contained tags

Good luck.

Community
  • 1
  • 1
immutabl
  • 6,857
  • 13
  • 45
  • 76