1

I have this string that contains html code :

String str ="<form action=''><span> First Name </span> <input type='text' id='fname' class='cls' size='40' required /> <span> [*]</span> <input type='submit' value='Submit' name='btn' /> <select name='slcEle' > <option value='opt'> Text</option> </select> <input type='radio' id='this'/> <button name='name' type='reset' value='val'> Text</button> <input type='range' min='0' max='100' name='grade'/> <button name='btnname' type='button'> Text</button>";

I want to split it, so that each html element be a separate string. The output could be array that contains this:

[0] = <form action=''>
[1] = <span> First Name </span>
[2] = <input type='text' id='fname' class='cls' size='40' required />
[3] = <span> [*] </span>
[4] = <input type='submit' value='Submit' name='btn' />
[5] = <select name='slcEle' >
[6] = <option value='opt'> Text</option>
[7] = </select>
and so on.

I can't use split function because as you see there are different characters and pattern for each string.

Can anyone help with this?

Stephan
  • 41,764
  • 65
  • 238
  • 329
F. Fo
  • 123
  • 6
  • 18
  • 2
    Obligatory caution about regex and HTML: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Andy Turner Mar 17 '16 at 00:22
  • 1
    1) Use an HTML parser. 2) You haven't defined any rules. When do you decide to split a tag into separate items? – shmosel Mar 17 '16 at 00:25

3 Answers3

1

If you want to handle html properly, I recommend you use a specific library that helps you. I recommend Jsoup

http://jsoup.org/

You'll find thousand times easier what you want to achieve.

Rafael Lucena
  • 635
  • 1
  • 6
  • 9
0

I want to split it, so that each html element be a separate string.

You can "mark" the initial string with a delimiter then split it.
In the sample code below, I ask the regex to ignore text with blank characters only.

SAMPLE CODE

String str = "<form action=''><span> First Name </span> <input type='text' id='fname' class='cls' size='40' required /> <span> [*]</span> <input type='submit' value='Submit' name='btn' /> <select name='slcEle' > <option value='opt'> Text</option> </select> <input type='radio' id='this'/> <button name='name' type='reset' value='val'> Text</button> <input type='range' min='0' max='100' name='grade'/> <button name='btnname' type='button'> Text</button>";

final String DELIMITER = "<--->";
String[] separateStrings = str //
                           .replaceAll("(?!\\s+)(<[^>]+>|[^/<>]+)", "$1" + DELIMITER) //
                           .split(DELIMITER);

int len = separateStrings.length;
for (int i = 0; i < len; i++) {
    System.out.format("[%d] = %s\n", i, separateStrings[i]);
}

OUTPUT

[0] = <form action=''>
[1] = <span>
[2] =  First Name 
[3] = </span>
[4] =  <input type='text' id='fname' class='cls' size='40' required />
[5] =  <span>
[6] =  [*]
[7] = </span>
[8] =  <input type='submit' value='Submit' name='btn' />
[9] =  <select name='slcEle' >
[10] =  <option value='opt'>
[11] =  Text
[12] = </option>
[13] =  </select>
[14] =  <input type='radio' id='this'/>
[15] =  <button name='name' type='reset' value='val'>
[16] =  Text
[17] = </button>
[18] =  <input type='range' min='0' max='100' name='grade'/>
[19] =  <button name='btnname' type='button'>
[20] =  Text
[21] = </button>
Stephan
  • 41,764
  • 65
  • 238
  • 329
0

I want to split it, so that each html element be a separate string.

Here is an alternative answer using the split() method only. (ie no delimiter needed). Note that with this solution, text with blank characters only are preserved.

SAMPLE CODE

String str = "<form action=''><span> First Name </span> <input type='text' id='fname' class='cls' size='40' required /> <span> [*]</span> <input type='submit' value='Submit' name='btn' /> <select name='slcEle' > <option value='opt'> Text</option> </select> <input type='radio' id='this'/> <button name='name' type='reset' value='val'> Text</button> <input type='range' min='0' max='100' name='grade'/> <button name='btnname' type='button'> Text</button>";

String[] separateStrings = str.split("(?<=>)|(?=</)");

int len = separateStrings.length;
for (int i = 0; i < len; i++) {
    System.out.format("[%d] = %s\n", i, separateStrings[i]);
}

OUTPUT

[0] = <form action=''>
[1] = <span>
[2] =  First Name 
[3] = </span>
[4] =  <input type='text' id='fname' class='cls' size='40' required />
[5] =  <span>
[6] =  [*]
[7] = </span>
[8] =  <input type='submit' value='Submit' name='btn' />
[9] =  <select name='slcEle' >
[10] =  <option value='opt'>
[11] =  Text
[12] = </option>
[13] =  
[14] = </select>
[15] =  <input type='radio' id='this'/>
[16] =  <button name='name' type='reset' value='val'>
[17] =  Text
[18] = </button>
[19] =  <input type='range' min='0' max='100' name='grade'/>
[20] =  <button name='btnname' type='button'>
[21] =  Text
[22] = </button>
Stephan
  • 41,764
  • 65
  • 238
  • 329