I have XML data for many scientific publications and I am trying to parse through the data in KNIME to extract the fields that I need. Here is one example: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pmc&id=PMC4400176
To extract the names of the authors, I am using the following XPath Query: /pmc-articleset/article/front/article-meta/contrib-group/contrib[@contrib-type="author"]
However, this returns:
BorisovaSvetlana A., KimHak Joong, PuXiaotao, LiuHung-wen*
I would like for the last and first names to be separated by some delimiter, comma/space, and for different author names to be separated by a semi-colon. Is this possible? Or is there a better way to extract the information compared to what I am currently doing that would allow me to achieve my ideal output:
Borisova, Svetlana A.; Kim, Hak Joong; Pu, Xiaotao; Liu, Hung-wen*
[edit]
Current KNIME workflow:
Sample current output:
I've tried having all of the author names for all of the publications outputting into a collection cell. (If I have all of the names outputting into multiple columns, this ends up creating hundreds of columns containing missing values. I've even tried to achieve my ideal output using multiple string manipulations, but it is still not as perfect, due to some author names having multiple names, hyphenated names, or names containing special characters.) The collection cell combines all of the author names with a comma delimiter between each author's name, but combines surnames and given-names. I can also do the same aforementioned string manipulations on these, but still run into the same issues as mentioned.
If I separate author names into multiple rows, this creates multiple rows for every article, from which I'm not sure how to get to my end goal for each article.
End goal:
Any ideas on how to solve this problem with the authors would be much appreciated!