-1

New to Regular Expressions, I want to have the following text in my HTML and would like to replace with something else

Example HTML:

{{Object id='foo'}}

Extract the id into a variable like this:

string strId = "foo";

So far I have the following Regular Expression code that will capture the Example HTML:

string strStart = "Object";
string strFind = "{{(" + strStart + ".*?)}}";
Regex regExp = new Regex(strFind, RegexOptions.IgnoreCase);

Match matchRegExp = regExp.Match(html);

while (matchRegExp.Success)
{

    //At this point, I have this variable:
    //{{Object id='foo'}}

    //I can find the id='foo' (see below)
    //but not sure how to extract 'foo' and use it

    string strFindInner = "id='(.*?)'"; //"{{Slider";
    Regex regExpInner = new Regex(strFindInner, RegexOptions.IgnoreCase);
    Match matchRegExpInner = regExpInner.Match(matchRegExp.Value.ToString());   

    //Do something with 'foo'

    matchRegExp = matchRegExp.NextMatch();
}

I understand this might be a simple solution, I am hoping to gain more knowledge about Regular Expressions but more importantly, I am hoping to receive a suggestion on how to approach this cleaner and more efficiently.

Thank you

Edit:

Is this an example that I could potentially use: c# regex replace

Derek
  • 653
  • 7
  • 20
  • Stop! Look and Listen! Every day some one wake up with the great idea of parsing Html with regex. Nothing Parse Html better than a Xml parser. While the way you ask your question may hide how hard it can be! Using`{{` instead of `<>` can hide the fact that parsing a comment like ">_< <3 I luv you => _o/" can turn you regex into a nightmare. In your a head regex is a simple "look for this" itts not! To parse html regex have to go recusive and go backto the start every time. Use a parser and you code will be simple as doing it in js. – Drag and Drop Aug 17 '17 at 07:13
  • Thank you, I value your opinion, RegEx seemed to be the easy approach but doesn't seem to be. I have attempted to move into `SubString` and `IndexOf` as I am attempting to do something similar to what WordPress' doShortCode() accomplishes and was able to find documentation on how that currently works. I am looking to get a proof of concept and move on from there. – Derek Aug 17 '17 at 13:20
  • Use an Html parser as [Html Agility Pack (HAP)](http://html-agility-pack.net/?z=codeplex). A simple nuget, and bim you can select anything you want in a html. It's not hard to learn there is close to Nothing to learn. – Drag and Drop Aug 17 '17 at 13:36
  • To get a proof of concept use some keyword and Google search to not turn this question in a off Site ressource list. Every librairies that parse html have strong exemple on the home page. And parsing html is so common you can find freelib every where. – Drag and Drop Aug 17 '17 at 13:42
  • What is interesting is that everyone recommends using the HTML agility pack...yet in 10 years on StackOverflow I have seen only one person answer a question with it on a regex question. So your mileage may vary. – ΩmegaMan Aug 17 '17 at 22:07
  • Its unclear what you want to do with `foo`. The title says "Replace everything else". So regardless of the technology your question is too vague. Distill it down to one individual problem. Provide an example as well, not just code, but data. – ΩmegaMan Aug 17 '17 at 22:10
  • Sorry for not being clear, by "everything else" I meant the first code sample of `{{Object id='foo'}}`. – Derek Aug 21 '17 at 00:22

1 Answers1

0

While I am not solving my initial question with Regular Expressions, I did move into a simpler solution using SubString, IndexOf and string.Split for the time being, I understand that my code needs to be cleaned up but thought I would post the answer that I have thus far.

string html = "<p>Start of Example</p>{{Object id='foo'}}<p>End of example</p>"
string strObject = "Slider"; //Example

//When found, this will contain "{{Object id='foo'}}"
string strCode = "";

//ie: "id='foo'"
string strCodeInner = "";

//Tags will be a list, but in this example, only "id='foo'"
string[] tags = { };

//Looking for the following "{{Object "
string strFindStart = "{{" + strObject + " ";
int intFindStart = html.IndexOf(strFindStart);

//Then ending in the following
string strFindEnd = "}}";
int intFindEnd = html.IndexOf(strFindEnd) + strFindEnd.Length;

//Must find both Start and End conditions
if (intFindStart != -1 && intFindEnd != -1)
{
    strCode = html.Substring(intFindStart, intFindEnd - intFindStart);

    //Remove Start and End
    strCodeInner = strCode.Replace(strFindStart, "").Replace(strFindEnd, "");

    //Split by spaces, this needs to be improved if more than IDs are to be used
    //but for proof of concept this is perfect
    tags = strCodeInner.Split(new char[] { ' ' });
}

Dictionary<string, string> dictTags = new Dictionary<string, string>();
foreach (string tag in tags)
{
    string[] tagSplit = tag.Split(new char[] { '=' });
    dictTags.Add(tagSplit[0], tagSplit[1].Replace("'", "").Replace("\"", ""));
}

//At this point, I can replace "{{Object id='foo'}}" with anything I'd like
//What I don't show is that I go into the website's database, 
//get the object (ie: Slider) and return the html for slider with the ID of foo
html = html.Replace(strCode, strView);

/*
    "html" variable may contain:

    <p>Start of Example</p>
    <p id="foo">This is the replacement text</p>
    <p>End of example</p>

*/
Derek
  • 653
  • 7
  • 20