0

I have 2 strings. and I need one Regex for both.

s1="The 8481D provides extraordinary accuracy, stability, and lower SWR.";

s2="<li>Complete with case and 9V battery</li><div id='warranty'><img src='1yr.gif'>";

I need to get all characters of s1 and the characters of s2 till the characters: <div id='warranty'>

so, it will be:

    s1="The 8481D provides extraordinary accuracy, stability, and lower SWR.";

    s2="<li>Complete with case and 9V battery</li>";

I thought of: .+?(?<=<div id="warranty">) but I got just the s2 string, also .+?(?<=<div id="warranty">|\.) didn't work, I got s1, but got too much characters in s2.

Chani Poz
  • 1,413
  • 2
  • 21
  • 46
  • Which programming language? Different host languages support different regex dialects. – tripleee Jul 05 '12 at 07:11
  • you said "and the characters of s2 til the characters: `'"'`", but then say you wanted to finish at ? – jay Jul 05 '12 at 07:12
  • Does it have to be Regex? It would be easier using functions like `strpos` and `substr`. – Johannes Egger Jul 05 '12 at 07:19
  • @jared, It finish at , but not always, the characters that always will be are `"
    "`
    – Chani Poz Jul 05 '12 at 07:24
  • The second string looks like HTML. Were you aware that Regex is probably [not the best tool](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) to parse HTML? Have you tried using a HTML parser such as HTML Agility Pack? – Darin Dimitrov Jul 05 '12 at 07:27
  • @Darin Dimitrov, I use HTML Agility Pack, and after that I must use Regex to clean the string-node. – Chani Poz Jul 05 '12 at 07:33
  • No, you don't need to use Regex. If you already use HTML Agility Pack use it to the end extract the information you need. – Darin Dimitrov Jul 05 '12 at 07:34

2 Answers2

2
.+?(?=<div\sid='warranty'>|\.)

or if you want to include and dot regex will be:

^.+?(?=<div\sid='warranty'>|$)
Mijalko
  • 529
  • 4
  • 13
1

Simplest way to do this in C# is using IndexOf and Substring methods (if you not insist on Regex):

static String GetValidString(String inputString)
{
    int end = inputString.IndexOf("<div id='warranty'>");
    if (end == -1)
        end = inputString.Length;
    return inputString.Substring(0, end);
}
Ria
  • 10,237
  • 3
  • 33
  • 60