The question is about extracting certain HTML elements, out of a given HTML file. There are multiple ways to do this. Let me point out some of them below.
1) Use a script with a Library to do this. For Java use JSOUP.
String br = "<html><source>foo bar bar</source></html>";
Document doc = Jsoup.parse(br, "", Parser.xmlParser());
for (Element sentence : doc.getElementsByTag("source"))
System.out.println(sentence.text());
}
This will give you the list of elements with the HTML tag source
. You can do the same for other languages like python
(use BeautifulSoup
) and NodeJS.
2) You can write a script to read HTML files as text files and do a search on text.
Move all your HTML files into a folder, and write a small program to load each file and search for the specific tags. Later save it to a CSV or any preferred output.
3) You can do the same with grep.
Simple do a search and load the results directly into a CSV file.
There are multiple other ways to do it. Since you mentioned that the manual workload is higher, try doing a small script to get the job done. Use the first approach as it is faster and easier.