0

I have a bibliography dataframe, with article titles, authors, journals and DOI (example below)

noms_prenoms_des_auteurs titre_de_larticle reference_de_larticle_doi
SOEWARTO J, CARRICONDE F, HUGOT N, BOCS S, HAMELIN C, ET MAGGIA L Impact of Austropuccinia psidii in New Caledonia, a biodiversity hotspot https://doi.org/10.1111/efp.12402
THIBAULT M, VIDAL E, POTTER M, DYER E, ET BRESCIA F The red-vented bulbul (Pycnonotus cafer): serious pest or understudied invader? https://doi.org/10.1007/s10530-017-1521-2

I want to retrieve the corresponding author for each article.

My first plan was to scraping on web (by extract text or mail icon), but the html class are not the same for each site, and some sites seems to forbid scraping.

Do you have any idea to retrieve this information ? Maybe with bibliography management packages ? (RefManage, rcrossref...).

Thanks for your answers !

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • By corresponding author, do you mean the [person who corresponds with the journal during the revision/publication process](https://www.cambridge.org/core/services/authors/journals/corresponding-author)? – jrcalabrese Jan 09 '23 at 21:44
  • Hello, yes this is what i mean. We can have the information at the end of the articles, or on the journal websites, with the mail icon right to the author's name. I wanted to use web-scraping for that, but i don't have access to all sites ("Error in open.connection(x, "rb") : HTTP error 503."). – IACPouembout Jan 09 '23 at 21:51
  • Have you confirmed that all the journals in your dataframe list the corresponding author somewhere on the abstract webpage or in the pdf itself? Not all journals say which author is the corresponding author. – jrcalabrese Jan 10 '23 at 03:26
  • Hello, usually yes, there is email adress of the corresponding author on websites or pdf, so with rvest & regex i found a way to extract email adress on websites, but i have trouble to dealing with captcha issues... this is why i'm trying to find another solution – IACPouembout Jan 10 '23 at 04:18
  • It's hard to tell what's causing `("Error in open.connection(x, "rb") : HTTP error 503.")` because you haven't provided a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Alternatively, if you have all the papers saved as pdfs somewhere, you can try to scrape the data from the pdf itself (assuming that information is in the pdf). – jrcalabrese Jan 10 '23 at 14:13

0 Answers0