I have a long list of citations for which I need to extract each author's full name, year published, title, etc. One of the citations looks like this:
Joe Bob, Jane Doe and George H. Smith (2017). A title of an interesting report: Part 2. Report Series no. 101, Place for Generating Reports, Department of Report Makers, City, Province, Country, 44 pages. ISBN: (print) 123-0-1234-1234-5; (online) 123-0-1234-1234-5.
And all of the citations are formatted in the same way. The part I am stuck on right now has to do with extracting the author's full names. I read here about how to extract values from a comma, space, or semi-colon separated list here by doing something like [\\s,;]+
. How would I do something similar for a comma or the word 'and'?
I assume that 'and' needs to be treated like a group of characters so I tried [^,|[and])]+
to match the spaces between either ,
or the character set [and]
but this doesn't seem to work. This question is similar in that it deals with a comma or a space, but the solution involves the spaces being stripped implicitly.
After getting this portion down I plan on building the rest of the expression to capture the other citation details. So assume that the string we are dealing with is simply:
Joe Bob, Jane Doe and George H. Smith
and each fullname should be captured.