1

Quite new to R... I loaded a file with 13458 observations containing a time and a value. I ran it through a program which detects homologue series. The output is a large list with 6 elements, including values IDed by the row number in the original file. I would like to export the original file with values detected by the program marked somehow so I can easily identify them in Excel. Hopefully that makes some sense.

My dataframe looks like this and I'm using the m.z and RT values:

        m.z       dummy     RT
1     151.0092    255975.8 15.043
2     151.0092    110111.7 15.456
3     151.0092    108958.1 15.243
4     151.0093   3258343.0 14.620
5     151.0127    107255.9  6.336

My output contains a list of related series and looks like this:

[359] "3518,4779,5929,6975,8032,9051,9825"       
[360] "5927,6977,8036,9052,9824,10507,11043"    

I would like a data frame that lets me know if a value has been identified, as this:

         m.z         dummy     RT       homologue
3518    459.2006   255975.8 15.043    TRUE
3519    459.2120   110111.7 15.456    FALSE
3520    459.2159   108958.1 15.243    FALSE

Thanks!

acylam
  • 18,231
  • 5
  • 36
  • 45
NoTech
  • 11
  • 2
  • 3
    Can you add what your dataframe looks like, what your output with the IDs looks like, and what you want your final output to look like? See [How to make a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?rq=1) – leeum Nov 06 '17 at 20:27
  • What is a homologue series? – leeum Nov 06 '17 at 20:48
  • 2
    It's not clear from your example how the output "list of related series" relates to your starting data frame. Nor how the homologue column is defined. Is there one "related series" for each row? Are the "related series" series of row numbers? Or `dummy` values? Something else? – Gregor Thomas Nov 06 '17 at 21:08
  • The program looks for m.z values that are separated by 44 and groups them together based on this. (For example, 144,188, 232 ,276 would be one series). Each row of the output is a series and lists the row number of each m.z value from the starting data frame. – NoTech Nov 06 '17 at 21:31

1 Answers1

0

Here is an attempt

your MS data:

DF <- read.table(text="m.z       dummy     RT
1     151.0092    255975.8 15.043
2     151.0092    110111.7 15.456
3     151.0092    108958.1 15.243
4     151.0093   3258343.0 14.620
5     151.0127    107255.9  6.336", header = T)

the script output:

vec <- c("1,3,5", "3,5") #from your example looks like a vector of strings with numbers separated by a comma 

As I understand you would like to label rows in df with TRUE/FALSE depending on appearance anywhere in vec?

DF$homologue <- ifelse(row.names(DF) %in%  as.numeric(unlist(strsplit(unlist(vec), ","))), T, F)

explanation:

unlist(vec) #in case it is a list and not a vector
strsplit(unlist(vec), ",") #split strings at "," returning a list
unlist(str... #convert that list into a vector
as.numeric(unlist(str... #convert to numeric

if any row names of DF are in vec they will be labeled T and if not F

DF
       m.z     dummy     RT homologue
1 151.0092  255975.8 15.043      TRUE
2 151.0092  110111.7 15.456     FALSE
3 151.0092  108958.1 15.243      TRUE
4 151.0093 3258343.0 14.620     FALSE
5 151.0127  107255.9  6.336      TRUE
missuse
  • 19,056
  • 3
  • 25
  • 47