0

I have one String variable that contains HTML data.Now i want to split that html string into multiple string and then finally merge those strings into single one.

This is html string:

<p><span style="text-decoration: underline; color: #ff0000;"><strong>para1</strong></span></p>
<p style="text-align: center;"><strong><span style="color: #008000;">para2</span> स्द्स्द्सद्स्द para2 again<br /></strong></p>
<p style="text-align: left;"><strong><span style="color: #0000ff;">para3</span><br /></strong></p>

And this is my expected output:

<p><span style="text-decoration: underline; color: #ff0000;"><strong>para1</strong></span><strong><span style="color: #008000;">para2</span>para2 again<br /></strong><strong><span style="color: #0000ff;">para3</span><br /></strong></p>

My Split Logic is given below...

  1. Split the HTML string into token based on </p> tag.
  2. And take the first token and store it in separate string variable(firstPara).
  3. Now take the each and every token and then remove any tag starting with<p and also ending with </p>.And store each value in separate variable.

4.Then take first token named firstPara and replace the tag </p> and then append each every token that we got through the step 3.

5.So,Now the variable firstPara has whole value...

  1. Finally, we just append </p> at the end of the firstPara...

This is my problem...

Could you please step me to get out of this issue...

Saravanan
  • 11,372
  • 43
  • 143
  • 213

2 Answers2

1

Here is regex example how to do it.

String pattern = @"(?<=<p.*>).*(?=</p>)";
var matches = Regex.Matches(text, pattern);
StringBuilder result = new StringBuilder();
result.Append("<p>");
foreach (Match match in matches)
{
    result.Append(match.Value);
}
result.Append("</p>");

And this is how you should do it with Html Agility Pack

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(text);
var nodes = doc.DocumentNode.SelectNodes("//p");
StringBuilder result = new StringBuilder();
result.Append("<p>");
foreach (HtmlNode node in nodes)
{
    result.Append(node.InnerHtml);
}
result.Append("</p>");
Dmitrii Dovgopolyi
  • 6,231
  • 2
  • 27
  • 44
1

If you would like to split a string by another string, you may use string.Split(string[] separator, StringSplitOptions options) where separator is a string array which contains at least one string that will be used to split the string

Example

//Initialize a string of name HTML as our HTML code
string HTML = "<p><span style=\"text-decoration: underline; color: #ff0000;\"><strong>para1</strong></span></p> <p style=\"text-align: center;\"><strong><span style=\"color: #008000;\">para2</span> स्द्स्द्सद्स्द para2 again<br /></strong></p> <p style=\"text-align: left;\"><strong><span style=\"color: #0000ff;\">para3</span><br /></strong></p>";
//Initialize a string array of name strSplit to split HTML with </p>
string[] strSplit = HTML.Split(new string[] { "</p>" }, StringSplitOptions.None);
//Initialize a string of name expectedOutput
string expectedOutput = "";
string stringToAppend = "";
//Initialize i as an int. Continue if i is less than strSplit.Length. Increment i by 1 each time you continue
for (int i = 0; i < strSplit.Length; i++)
{
    if (i >= 1) //Continue if the index is greater or equal to 1; from the second item to the last item
    {
        stringToAppend = strSplit[i].Replace("<p", "<"); //Replace <p by <
    }
    else //Otherwise
    {
        stringToAppend = strSplit[i]; //Don't change anything in the string
    }
    //Append strSplit[i] to expectedOutput
    expectedOutput += stringToAppend;
}
//Append </p> at the end of the string
expectedOutput += "</p>";
//Write the output to the Console
Console.WriteLine(expectedOutput);
Console.Read();

Output

<p><span style="text-decoration: underline; color: #ff0000;"><strong>para1</stro
ng></span> < style="text-align: center;"><strong><span style="color: #008000;">p
ara2</span> ?????????????? para2 again<br /></strong> < style="text-align: left;
"><strong><span style="color: #0000ff;">para3</span><br /></strong></p>

NOTICE: Because my program does not support Unicode characters, it could not read स्द्स्द्सद्स्द. Thus, it was translated as ??????????????.

Thanks,
I hope you find this helpful :)

Picrofo Software
  • 5,475
  • 3
  • 23
  • 37
  • @Saravanan Sorry, about that but I can't really understand your comment. May you please provide what exactly you are trying to do? :) – Picrofo Software Dec 14 '12 at 09:29
  • I want to replace

    tages.That means i have only one

    that will be in html string starting position and one

    tage that will be in the html string ending position.Not in any other place...
    – Saravanan Dec 14 '12 at 09:56
  • @Saravanan I've updated my answer. Sorry for misunderstanding. Have a nice day :) – Picrofo Software Dec 14 '12 at 10:04