I have the following html string:
<a href="http://www.nndc.bnl.gov/nsr/fastsrch_act2.jsp?aname=F.V.Adamian">F.V.Adamian</a>, <ahref="http://www.nndc.bnl.gov/nsr/fastsrch_act2.jsp?aname=G.G.Akopian">G.G.Akopian</a>
I want to form a single plain text string with the author names so that it looks something like (I can fine tune the punctuation later):
F.V.Adamian, G.G.Akopian.
I'm trying to use 'regexp' in Matlab. When I do the following:
regexpi(htmlstring,'">.*</a>','match')
I get:
">F.V.Adamian</a>, <a href="http://www.nndc.bnl.gov/nsr/fastsrch_act2.jsp?aname=G.G.Akopian">G.G.Akopian</a>,
Why? I'm trying to get it to continuously output (hence I did not use the 'once' operator) all characters between "> and , which is the author's name. It works fine for the first one but not for the second. I am happy to truncate the "> and with a regexprep(regexpstring,'','') later.
I see that regexprep(htmlstr, '<.*?>','')
works and does what I want. But I don't get it...