Probelm with a dataframe in R

Question

I have a dataframe with 10 columns and 7.000 rows and I want create a new dataframe that has a specific value of one column, I try with subset.data.frame but I have this error:

    Error in subset.default(peak.anno_4$ENTREZID == c("171832", "172856",  : 
      argument "subset" is missing, with no default
    In addition: Warning message:
    In peak.anno_4$ENTREZID == c("171832", "172856", "177870", "173051",  :
      longer object length is not a multiple of shorter object length

Can someone suggest a solution?

It helps reproduce the problem when the post includes a data set. An effective way to include one is `dput()`. Run dput, then paste the output into your question. [rdocumentation](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/dput). If your object is a vector, matrix, table, or data frame and is large, `object |> head() |> dput()` will help give manageably sized output. — Isaiah, Oct 13 '22 at 11:54
Hi there! It's hard to answer your question without more information. I would suggest trying to add a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and specify exactly what kind of output you want as an outcome. It's also better if you post the exact code you were running when you have found the error, instead of describing what you tried to do. — Giulio Centorame, Oct 13 '22 at 11:54
neuron_df <- subset.data.frame(peak.anno_4$ENTREZID == c("171832","172856", "177870", "173051", "179675", "183905", "172455", "172850", ...)) this was my command — Michela Francesconi, Oct 13 '22 at 12:00

score 0 · Answer 1 · answered Oct 13 '22 at 11:59

Hello Michela and welcome to stackoverflow.

It would be really helpful to people to give you an answer that you're looking for if you could provide us a replicable example.

Luckily I understood(I hope) your problem and please keep this point in mind for the next question.

I assume you're trying to retrieve rows that have certain ID values. If that is the case, you should use the library dplyr. If your data set is huge(your csv file is more than 10GB), I recommend the library data.table.

Here is the sample code.

library(dplyr)

sample <- data.frame("ID" = c("1", "2", "3", "4"),
                     "NAME" = c("A", "B", "C", "D"))

#retrieving only 3 & 4
search_value <- c("3", "4")
#creating a new dataframe sample2
sample2 <- sample %>% 
  filter(ID %in% search_value)

You can look for additional materials in the dplyr website(https://github.com/tidyverse/dplyr)

I hope this helps.

It would be great if you could tick my answer to be the chosen answer for your question. @Michela Francesconi — B_Heidel, Oct 14 '22 at 12:58

Probelm with a dataframe in R

1 Answers1