0

This was my question which I would like to follow up. Question

I m following a rather complex way to do what I want to do as of now.

But a simple solution which proposed by Ben was this

library(tidypmc)
library(tidyverse)
library(europepmc)

doc <- map("PMC7809753", epmc_ftxt)
tbls <- pmc_table(doc[[1]])
tbls[[1]]

My objective was i was trying to do. See drugs or disease etc on europmc which have open access and the fetch its data **as a tabular form** and save it.

To achieve the first part this does the job

library(europepmc)
b <-epmc_search(query = 'cytarabine aml OPEN_ACCESS:Y',limit = 20)
pmcids <- b$pmcid[b$isOpenAccess=="Y"]

I get pmcids which class is character.

To do the second part as Ben suggested this works really well.

doc <- map("PMC7809753", epmc_ftxt)
tbls <- pmc_table(doc[[1]])
tbls[[1]]

To address the above with help a generous stackoverflow user I got this function

b <-epmc_search(query = 'cytarabine aml OPEN_ACCESS:Y',limit = 6)
pmcids <- b$pmcid[b$isOpenAccess=="Y"]
    pub_tables <- lapply(pmcids, function(pmc_id) {
      message("-- Trying ", pmc_id, "...")
      doc <- tryCatch(pmc_xml(pmc_id), 
                      error = function(e) {
                        message("------ Failed to recover PMCID")
                        return(NULL)
                      })
      if(!is.null(doc)) { 
        #-- If succeed, try to get table
        tables <- pmc_table(doc)
        if(!is.null(tables)) {
          #-- If succeed, try to get table name
          table_caps <- pmc_caption(doc) %>%
            filter(tag == "table")
          names(tables) <- paste(table_caps$label, table_caps$text, sep = " - ")
        }
        return(tables) 
      } else {
        #-- If fail, return NA
        return(NA)
      }
    })
    names(pub_tables) <- pmcids

This works well but i got this error

Error in names(tables) <- paste(table_caps$label, table_caps$text, sep = " - ") : 
  'names' attribute [3] must be the same length as the vector [2]

These are my pmcids which Im using to query it with limit set to 6.

"PMC7837979" "PMC7809753" "PMC7790830" "PMC7797573" "PMC7806552" "PMC7836575"

Now how do i skip those papers where if I dont get any information then I will skip to the next one in other words how to work around this error.

I have very tiny/minute experience in creating complicated function but from the code if i understand this chunk of code should be working on it but not sure why it is not!!.

} else {
    #-- If fail, return NA
    return(NA)
  }


Error in names(tables) <- paste(table_caps$label, table_caps$text, sep = " - ") : 
      'names' attribute [3] must be the same length as the vector [2]

For example When the limit is set 4 it works well the pub_tables is returned as list and the last pmcid is returned as

$PMC7797573
NULL

But the problem occurs with "PMC7806552". So how do i get the null result when i see an error in fetching table and then move to the next PMCIDs.

Any help would be really appreciated.

Or there is any simpler way of doing it.

PesKchan
  • 868
  • 6
  • 14

1 Answers1

2

Here is the function modified slightly to work. The only edit is that I added these lines:

table_caps <- table_caps %>% group_by(label) %>% 
   summarise(text = paste(text, collapse=" "), 
             tag = "table")

after the initial definition of the table_caps object. The problem was that some table captions had multiple sentences. This pastes the multiple sentences together.

b <-epmc_search(query = 'cytarabine aml OPEN_ACCESS:Y',limit = 10)
pmcids <- b$pmcid[b$isOpenAccess=="Y"]
pub_tables <- lapply(pmcids, function(pmc_id) {
  message("-- Trying ", pmc_id, "...")
  doc <- tryCatch(pmc_xml(pmc_id), 
                  error = function(e) {
                    message("------ Failed to recover PMCID")
                    return(NULL)
                  })
  if(!is.null(doc)) { 
    #-- If succeed, try to get table
    tables <- pmc_table(doc)
    if(!is.null(tables)) {
      #-- If succeed, try to get table name
      table_caps <- pmc_caption(doc) %>%
        filter(tag == "table")
      table_caps <- table_caps %>% group_by(label) %>% 
        summarise(text = paste(text, collapse=" "), 
                  tag = "table")
      names(tables) <- paste(table_caps$label, table_caps$text, sep = " - ")
    }
    return(tables) 
  } else {
    #-- If fail, return NA
    return(NA)
  }
})

DaveArmstrong
  • 18,377
  • 2
  • 13
  • 25