-2

I'm working on a meta-analysis and have exported my list of references, along with inclusion/exclusion and their labels, from Rayyan as a CSV. Some papers only have a few labels, others have a dozen. However, when Rayyan exports it, it groups all the labels together in a single column, separated by a comma.

I'm afraid I already know the answer, but is there anyway R can read those as separate labels, from the single column? It's over 400 papers and I don't love the idea of manually separating all those...

Rayyan can also export via RefMan, BibTeX, and EndNote. I've not really worked with those before, so maybe one of them will actually allow it to read them?

kat43
  • 1
  • 2
    Welcome to SO, kat43! Questions on SO (especially in R) do much better if they are reproducible and self-contained. By that I mean including attempted code (please be explicit about non-base packages), sample representative data (in this case, likely the first `n` lines of the raw text file, in a [code block]). Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. – r2evans Jun 25 '23 at 01:13
  • 1
    Maybe use something like `df <- read.csv(file = path/to/file, header = FALSE, sep = ',')`? Can you provide an example of what the input file looks like? – Harry Smith Jun 25 '23 at 02:07
  • There is an R package `bib2df` that will load a bibtex file into a data.frame. – G5W Jun 26 '23 at 00:29
  • @HarrySmith That is what I normally use to read it in, I believe the "sep=','" is to separate the columns. This is the text that is given to me in a single column when I export from Rayyan: RAYYAN-INCLUSION: {"Kira"=>"Included"} | RAYYAN-LABELS: 1993,ASFA,Iceland,Necropsy,Fin Whale,Pathology,Sei Whale,Sperm Whale,Fareo Islands,Harbor Porpoise,Long-finned Pilot Whales,Atlantic White-Sided Dolphin I already know I will have to delete the beginning, but would rather not have to separate all those labels into separate cells – kat43 Jun 26 '23 at 02:35
  • @G5W I did try that package, I may not have been using it correctly but it also didn't seem to achieve what I needed. – kat43 Jun 26 '23 at 02:37
  • @kat43. Please provide a reproducible example of the data. We can't do much else without that. – Harry Smith Jun 26 '23 at 04:08
  • Are you always trying to get the part labeled `RAYYAN-LABELS:`? Is there ever anything _after_ the RAYYAN-LABELS? – G5W Jun 26 '23 at 11:23

1 Answers1

0

I do not see your offending column, but if I understand correctly, once you have read it in normally, it might be off this form:

coldat <- c("a1,a2", "b1,b2,b3")

where coldat is your column of labels, ie paper 1 has two labels (a1, a2), paper 2 has three (b1, b2, b3), etc.

You combine mapping (from purrr) with strsplit like so:

list_of_labels <- purrr::map(coldat, ~ strsplit(.x, split = ","))

The result will be a list of length coldat (number of rows in your dataframe), where each element of the list is a vector of the labels.

[[1]]
[[1]][[1]]
[1] "a1" "a2"


[[2]]
[[2]][[1]]
[1] "b1" "b2" "b3"

You then can manipulate your list of labels however you like to get the desired output.

Is this at all what you are looking for?

Rorio
  • 36
  • 5