-1

fmcountdata1 tableI have a matrix of data and I wanted to remove all rows that have values below a set threshold of 10. I've checked other posts on here and they don't seem to work in my case in R for some reason. I still relatively new to R so getting to grips with it at the moment. What would you recommend I do to accomplish this?

For example, I would want the row "MIR6859-1" removed completely as it has count data below 10 across every condition.

Here is the code I have tried so far but I keep getting the error "Error in data < 10 : comparison (3) is possible only for atomic and list types"

or that the column name "KOA1" object not found with the subset method.

enter code herefmcountdata1 <- mergecountdata1[!(mergecountdata1$KOA1<10),] enter code herefmcountdata1 <- enter code heremergecountdata1[!apply(data<10,1,any,na.rm=TRUE),] enter code herefmcountdata1 <- mergecountdata1 enter code heresubset(fmcountdata1, KOA1<10)

Here is a snippet of the dataset:

KOA1 KOA2 KOA3 KOA4 KOB1 KOB2 KOB3 KOB4 CON1 CON2 CON3 CON4 DDX11L1 0 0 0 0 0 0 0 0 0 0 0 0 WASH7P 16 28 25 54 28 26 21 40 17 30 19 39 MIR6859-1 4 1 1 3 1 0 0 0 0 1 0 1 MIR1302-2HG 0 1 0 1 1 0 0 1 0 0 0 0 MIR1302-2 0 0 0 0 0 0

str of my data set: chr [1:59412, 1:12] " 0" " 16" " 4" " 0" ... - attr(, "dimnames")=List of 2 ..$ : chr [1:59412] "DDX11L1" "WASH7P" "MIR6859-1" "MIR1302-2HG" ... ..$ : chr [1:12] "KOA1" "KOA2" "KOA3" "KOA4" ... - attr(, "names")= chr [1:712944] NA NA NA NA ...

  • 2
    we need more details please:a [mcve] and an example of what "don't seem to work" means exactly ... – Ben Bolker Feb 12 '20 at 13:36
  • subset(df, value < 10) Change 'df' with the name of your dataframe. Change 'value' with the name of your column which contains the values you want to subset on. – maarvd Feb 12 '20 at 13:55
  • I've tried this but it keeps saying object not found even though I can clearly see the column labelled as such, how do I post a snippet of my dataset here and retain the tabulated format? – Yaseen Ahammed Feb 12 '20 at 14:39
  • Hi I've added the dataset as a screenshot snippet since I couldn't get the pasted data to format as a table correctly. I also added a bit more info on what I'm trying to do. Any ideas? thanks – Yaseen Ahammed Feb 12 '20 at 15:12
  • Several suggestions of how to post sample data [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Also, see the site help on [formatting](https://stackoverflow.com/editing-help#code) so the question can be more legible – camille Feb 12 '20 at 15:35
  • Ok thank you for the resource! – Yaseen Ahammed Feb 12 '20 at 15:54

1 Answers1

0

Is this what you had in mind?

set.seed(12)
data <- data.frame(v1=sample(c(1:20,NA), 10),
                   v2=sample(c(1:20,NA), 10))
data
   v1 v2
1   2 18
2  16  6
3  14 12
4   5 10
5  18  7
6  12 16
7  NA 13
8   8  8
9  11  4
10 15 14

# Remove rows of data if *any* column in that row contains a value<10
data.any <- data[!apply(data<10,1,any,na.rm=TRUE),]
data.any # rows 3,6,7 and 10 remain
   v1 v2
3  14 12
6  12 16
7  NA 13
10 15 14

# Remove rows of data if *all* columns in that row contains a value<10
data.all <- data[!apply(data<10,1,all,na.rm=TRUE),]
data.all # all but row 8 remain
   v1 v2
1   2 18
2  16  6
3  14 12
4   5 10
5  18  7
6  12 16
7  NA 13
9  11  4
10 15 14
Edward
  • 10,360
  • 2
  • 11
  • 26
  • Ah I get an error when I try this. I just realised it's a matrix instead of a dataframe what changes should I make? Thanks! – Yaseen Ahammed Feb 12 '20 at 14:15
  • I'm getting the error "Error in data < 10 : comparison (3) is possible only for atomic and list types" – Yaseen Ahammed Feb 12 '20 at 14:17
  • You better show what your data looks like, as Ben asked earlier. My commands above works if `data` is a matrix as well. – Edward Feb 12 '20 at 14:26
  • How do I attach it on here? I've copied it but it says it's too long when I try to paste. I'm really sorry, I'm so brand new to all of it. – Yaseen Ahammed Feb 12 '20 at 14:32
  • I'm not really sure either! Maybe just a glimpse of the data. A small part. – Edward Feb 12 '20 at 14:40
  • How did you manage to paste the above answer and retain tabulated format for that example data set? – Yaseen Ahammed Feb 12 '20 at 14:44
  • Ok thanks I updated the main post but it seems like it's just a bunch of jumbled text instead of the original table even though i used that code tool. – Yaseen Ahammed Feb 12 '20 at 14:49
  • Umm. Its difficult to know from what you posted. Are you sure the data is a matrix with rows and columns? The data cannot contain and non-numeric. What is the class? – Edward Feb 12 '20 at 15:04
  • I'll try and attach a screenshot now, there are columns and rows with gene names on each row descending and the condition of the experiment on the columns going across. – Yaseen Ahammed Feb 12 '20 at 15:06
  • Can you post the output of `str (data)` where data is the name of your data set? And are you replacing `data` with the name of your data set? – Edward Feb 12 '20 at 15:25
  • Yes I'm replacing the names with my own variable names, I posted the output in the main post. – Yaseen Ahammed Feb 12 '20 at 15:30
  • `data.all <- mergecountdata1[!apply(mergecountdata1<10,1, all, na.rm=TRUE),]` – Edward Feb 12 '20 at 15:37
  • It did something but I seem to be getting some weird resulting table where the counts are in the hundreds of thousands for a total of 5 rows. It should only be removing those below the count of 10 so i'm not entirely sure what's happening. For example, that WASH7P row is absent even though the counts for each condition in that row are above 10. – Yaseen Ahammed Feb 12 '20 at 15:44
  • What is the output of `class(mergecountdata1)`? And how did you import the data into R? – Edward Feb 12 '20 at 16:00
  • The output is "matrix" and I imported the data from a .txt file via the read.table function which I then converted to matrix via the as.matrix function. – Yaseen Ahammed Feb 12 '20 at 16:04
  • OK. I'm not sure why you're only getting 5 rows remaining from hundreds of thousands, which you are sure have counts of all columns of 10 or more, including WASH7P. You can see that my code works for the sample data set above, even if you convert it into a matrix like you did for your data. There must be something else that you haven't mentioned. – Edward Feb 12 '20 at 16:16