1

I have a html string. I want to extract src attribute from tag. I get the html string in "summaryContent" , Now i want it to finf and return the src . If this string contain two or three tag then it should find all the "src" of it.

for (int i = 0; i < contents.size(); i++) {
                if (contents.get(i).summary != null) {
                    summaryContent = contents.get(i).summary; // There is only one time this condition is true
                } else {
                    continue;
                }

This is what i get it in summaryContent

<ol start="7">
<li>
<h3><strong>Charlotte Casiraghi</strong></h3>
</li>
</ol>
<strong>Family Fortune:  </strong>$1 billion
<img class="size-full wp-image-346 aligncenter" src="http://rarelyknownthings.com/wp-content/uploads/2015/10/Picture1.png" alt="Picture1" width="943" height="1350" />
&nbsp;
&nbsp;
Charlotte Marie Pomeline Casiraghi is the second child of Caroline Princess of Hanover, Princess of Monaco and Stefano Casiraghi, an industrialist. She is eight in line to the throne of Monaco. Charlotte is a published writer and magazine editor.
<img class="aligncenter" src="http://rarelyknownthings.com/wp-content/uploads/2015/10/f762a5ca08aab85785f48c8425f089d7.png" alt="" />
Charlotte and her two brothers were born in the Mediterranean Principality of Monaco. When she was four years old, her father was killed in a boating accident. After his death, Princess Caroline moved the family to the Midi village of Saint-Rémy-de-Provence in France, with the intention of minimizing their exposure to the press.
<!--nextpage-->
<ol start="6">
<li>
<h3><strong>Hind Hariri</strong></h3>
</li>
</ol>
user3585510
  • 131
  • 2
  • 10

3 Answers3

4

You could extract it using a regex:

Pattern p = Pattern.compile("src\\s*=\\s*['\"]([^'\"]+)['\"]");
Matcher m = p.matcher(summaryContent);
if (m.find()) {
  String srcResult = m.group(1);
}

Explanation

  • src matches the characters src literally (case sensitive)

  • \s* match any white space character [\r\n\t\f ]

  • Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]

  • = matches the character = literally

  • \s* match any white space character [\r\n\t\f ]

  • Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]

  • ['"] match a single character present in the list below

  • '" a single character in the list '" literally (case sensitive)

  • 1st Capturing group ([^'"]+) match a single character not present in the list below

  • Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]

  • '" a single character in the list '" literally (case sensitive)

  • ['"] match a single character present in the list below

  • '" a single character in the list '" literally (case sensitive)

Smittey
  • 2,475
  • 10
  • 28
  • 35
  • I think it is working .just it is giving an error of array index out of bound error . java.lang.ArrayIndexOutOfBoundsException: length=4; index=4 – user3585510 Nov 12 '15 at 18:46
  • I also used While in place of if – user3585510 Nov 12 '15 at 18:49
  • This sounds like a problem with your outer loop, note the code I gave (as my code doesnt have anything which could produce your error) – Smittey Nov 12 '15 at 18:54
  • Yeah, i figured. Just one change in your code.use m.group(1) instead of m.group(2) – user3585510 Nov 12 '15 at 18:56
  • Since i got the url.. i will load the image but there is one problem. How could i place these image at their actual position . Since i use text-view and i dont know before hand that where these pictures will lie – user3585510 Nov 12 '15 at 18:59
0

I recommend exploring the possibility of using regular expressions.

You could start from reading here: Regular expression to get an attribute from HTML tag

Community
  • 1
  • 1
voliveira89
  • 1,134
  • 2
  • 9
  • 22
0

You can extract the src tag from the htmlString by using the subString method.

htmlString = htmlString.substring(htmlString.indexOf("src=\""));
htmlString = htmlString.substring("src=\"".length());
htmlString = htmlString.substring(0, htmlString.indexOf("\""));

Hope this helps.

Explanation:

Step 1:

  • htmlString.indexOf("src=\"")

This finds the index position where the "src" tag is encountered in the actual string.

  • htmlString.substring(htmlString.indexOf("src=\""))

Then we substring the original string from the found index position of "src" tag.

Step 2:

  • htmlString.substring("src=\"".length())

Here we remove the "src" tag from the string obtained from Step 1.

Final Step

  • htmlString.substring(0, htmlString.indexOf("\""))

Starting from the zeroth index till the next occurrence of double quote, we substring to extract the link in the src tag

Itapu Vinay
  • 687
  • 2
  • 9
  • 18