0

Hi everyone I have this main dataframe with two cols" Ac" and "description" and several other excel files that are my samples with the same col headers "Ac and "description". I wish to check if the rows of each sample file exists in the main dataframe with a conditional output (TRUE, FALSE). and I will combine each sample output with the main dataframe; therefore my goal is something like this:

AcDes                  Sample1 Sample2 
UniprotP6666 ProteinA  True    False
Uniprot P7777 ProteinB False   True

I have 29 samples and I am looking for a smart way to do this instead of typing this 29 times:

sample1match <- ifelse(maindf$Ac %in% sample1.xls$Ac, "True", "False")

and combining all of e.g. sample1match into a single df

I tried this:

temp = list.files(pattern="*.xls") 

for (i in 1:length(temp)) 
    assign(temp[i], read_xlsx(temp[i]))

temp is a list of dfs in which each element is my sample excel file.

for (i in 1:length(temp)) {
    ifelse(maindf["Accesssion"] %in% i["Accession"],"TRUE","FALSE")
}

maindf is the uniqueaggregatedaccesssion ;

Error in [.data.frame(uniqueaggregatedaccession, "Accesssion") : undefined columns selected 5. stop("undefined columns selected") 4. [.data.frame(uniqueaggregatedaccession, "Accesssion") 3. uniqueaggregatedaccession["Accesssion"] 2. uniqueaggregatedaccession["Accesssion"] %in% i["Accession"] 1. ifelse(uniqueaggregatedaccession["Accesssion"] %in% i["Accession"], "TRUE", "FALSE")

I also tried using lapply: lapply(temp,function(x)

ifelse(uniqueaggregatedaccession$Accession%in%temp(x),"TRUE"","FALSE") 

Error: unexpected string constant in " lapply(temp,function(x) ifelse(uniqueaggregatedaccession$Accession%in%temp(x),"TRUE"","FALSE"

I'm quite a newbie at this and would appreciate any advice where my code has gone wrong. Thanks!

Edward
  • 10,360
  • 2
  • 11
  • 26
Kei L
  • 1
  • 2
    Hi please provide some data via ```dput()``` will be helpful to help. – Tushar Lad Mar 07 '20 at 04:59
  • First error tells you that there is no column or variable "Accesssion" in your maindf. Second error reflects a typo in `"TRUE"","FALSE"`. Should be `"TRUE", "FALSE"` without the double double quote. And yes. As @TusharLad pointed out. Please provide some example data. – stefan Mar 07 '20 at 07:33
  • 1
    I doubt very much that you should be using `ifelse`. The task sounds much more like a merge operation. The `merge` and `match` functions would seem more useful for situations where there might be differing numbers of rows in the two objects to be tested for membership. (I'm also wondering if the first error was bc you have too many `s`'s in "Accesssion" and the second error was from too many double quotes in `"TRUE""` – IRTFM Mar 07 '20 at 07:48
  • If you add a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610) its way easier for others to find a solution to your problem. By defintion,a MRE is **not** putting your whole data set in the question but creating an example with as little code and data as needed to replicate the problem! The MRE will make it easier for others to find and test a answer to your question. That way you can help others to help you! – dario Mar 07 '20 at 08:15
  • I will add that I suspect you are using Rstudio because that IDE has the annoying habit of adding paired double quotes at the end of character strings on my machine. I think the developers should inhibit the paired insertions when the double-quote key is pressed immediately following an alpha-numeric character. At this point I'm agreeing that this question should be closed as most probably a typo, but you code recover the effort of typing the beginning of your question if you added a reproducible example with two or three small dataframes. Otherwise you should delete your question. – IRTFM Mar 07 '20 at 16:32

0 Answers0