1

I have a CSV-file, where two columns contain one or more integers per cell.

df <- data.frame(x=c("a","b","a","b"), 
y=c("datatype 1","datatype 1","datatype 2", "datatype 2"), 
z=c("2,3", "1,2","1,2,3,4,5", "3"))

names(df) <- c("hypothesis", "type", "mass") 

> df
  hypothesis       type      mass
1          a datatype 1       2,3
2          b datatype 1       1,2
3          a datatype 2 1,2,3,4,5
4          b datatype 2         3

I want to extract those integers from the .csv as vectors and assign them to variables x (datatype 1, hypothesis a) and y (datatype 2, hypothesis a) in my code.

Right now, I'm using subset to filter the table by "datatype" (column 2) and which("hypothesis"/column 1) to get the corresponding "mass" values I need. In the next step I want to use intersect to find out, which elements are shared by x and y variables.

My question is, how can I get a .csv cell content like "1,2,3" into a vector, to which the intersect function is applicable?

When I just call the cell, I get typeof integer and when intersect is applied, the result is character(0). When I manually assign x <- c(1,2,3,4,5); y <- c(2,3) the result is - as it should be - 2 3

Ezra
  • 159
  • 1
  • 10
  • Pictures are not code or data unless it's image processing-related. Please try to respect the folks who answer on the site and follow the guidelines http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example that were shown in the links presented to you when you posted a question in the R tag. The use of images for data or code is now at almost epidemic proportions in the R tag. – hrbrmstr Jan 03 '17 at 13:57
  • My apologies, I edited the post, I hope it is replicable and up to standards now. – Ezra Jan 03 '17 at 14:17

1 Answers1

0

We can split the 'mass' by the 'type', split the string using strsplit, unlist, convert to numeric, get the unique elements and apply intersect to find the elements that are common across the list elements

lst <- setNames(lapply(split(df$mass, df$type), function(x) 
       sort(unique(as.numeric(unlist(strsplit(as.character(x), ",")))))), c("x", "y"))

Reduce(intersect, lst)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you akrun. I tried part of your suggested code: my data looks like this: `> x [[1]] [1] 2,3 Levels: 1 1,2 1,2,3,4,5 2 2,3 2,3,4,5 3 3,4,5 4 5` i used your code: `a <- as.numeric(unlist(strsplit(as.character(x), ",")))` and the result is: `> a [1] 6` . Do I need to assign the column in my dataframe a different data type? I really do just need a vector of the elements, I would like to apply the intersect in a different step (in the function I am currently writing, I have 4 variables that need to be intersected with each other). Any idea, what I am doing wrong? – Ezra Jan 03 '17 at 14:52
  • @Ezra If you need 4 variables to be intersect, it is better to keep it as a `list` (as I showed and then use `Reduce` instead of creating individual objects in the global environment – akrun Jan 03 '17 at 14:54
  • @Ezra Regarding your object `x`, is it a `list` or `vector` ? If it is a `list`, then `strsplit(as.character(unlist(x)), ",")` – akrun Jan 03 '17 at 14:56
  • thank you. My mistake - apparantly - was the `as.character`! I just wrote as.numeric(unlist(strsplit(x, ","))). – Ezra Jan 03 '17 at 15:42