I have an html file and need to read it and access to some values :
myHtml = 'toto.html';
readFile = fileread(myHtml);
now to parse the html file , do you know if it's possible to convert html to xml and then use xpath ?
I have an html file and need to read it and access to some values :
myHtml = 'toto.html';
readFile = fileread(myHtml);
now to parse the html file , do you know if it's possible to convert html to xml and then use xpath ?
I would not recommend attempting to convert HTML to XML. They are different formats, and you are likely to get burned. HTML parsers exist, so we can use those directly.
Also, just for completeness, don't try and parse HTML with regex. There are Stack Overflow questions about parsing HTML in Matlab in which the answers recommend regex. Do innocent kittens a favor and tune them out.
Unfortunately, it doesn't look like Matlab has an HTML parser as part of it's library.
Fortunately, you can leverage Java code with ease in Matlab!
With that, Java HTML parsers are fair game. Look into jsoup or jtidy. Poke around this question.
Actually, looking at that question, plus the Comparison of HTML parsers Wikipedia article (thanks @Daniel R!) it looks like HTMLCleaner or Jtidy might clean HTML to XML. Again, I wouldn't bother and would simply parse HTML directly.