I have this task: I must read an HTML file and match all the <a>
tags with all their attributes and print them out. For example: for the tag:
<a href="https://www.facebook.com" alt="Facebook icon" title="Facebook" target="_blank"></a>
to be printed:
href - https://www.facebook.com
alt - Facebook icon
title - Facebook
target- _blank
text – not found
I have basic knowledge of regex and zero knowledge of reading from a file in java. Can someone give me some hints, advices and explanations on how to do it efficiently?
The regex expression for matching the <a>
tag with all attributes and the closing </a>
, in my opinion, might be:
"\<[aA]\w\>\w\<\/[aA]\>*"