0

I have a dataframe that looks as follows but consists of hundreds of rows and columns making conventional filtering in R a challenge.

A simplified image is shown below:

The rows represent values from a test and the columns represent different treatments

How do I select all rows(i.e., tests) that have values between -0.5 and 1 for each "treatment" column and generate this as an output? Your thoughts are much appreciated!

veg2020
  • 956
  • 10
  • 27
  • Can you please provide a reprex? https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Jon Spring Sep 11 '18 at 00:10
  • Within the hyperlink image of a "dataframe" provided as an example, I would like to select the following entries (in row/column format) : test8/d1, tests15,16/d2, tests 4,6,16, 17, 18/d3, test18/d4, and test 3/d5. Hope this clarifies? – veg2020 Sep 11 '18 at 00:28
  • What do you mean by "select the entries"? What do you want the result to look like? Do you want a vector, a dataframe, or something else? Do you want to retain any information about which columns or rows these values came from? – DanY Sep 11 '18 at 00:43
  • I'd make this dataset a long file, collapsing `d1` to `d5` to a single column called `d`, adding a `time` column with the values `1` to `5`. Then you can simply select the rows you want. If you're using the tidyverse, use 'tidy' data. – thelatemail Sep 11 '18 at 01:02
  • This is a most elegant solution and a terrific example of the power of tidy data. Thank you!! – veg2020 Sep 11 '18 at 13:39

1 Answers1

3

Create Example Data:

df <- data.frame(
    test = paste0("test", 1:18),
    d1 = c(rep(-57, 7), 0, rep(-99, 10)),
    d2 = c(rep(-4, 14), 1, 0.1, -99, -99),
    d3 = c(rep(-89, 3), 0.99, -47, 0.8, rep(-55, 8), -1.56, 0.1, 1, 0),
    d4 = c(rep(-99, 6), rep(-57, 5), 0.7, -3, -13, -99, 0.98, -99, 0.99),
    d5 = c(rep(-57, 2), 0.4, rep(-99, 14), -57),
    stringsAsFactors = FALSE
)

If you just need to grab the elements:

# get TRUE/FALSE matrix of whether element meets your criteria
meets_criteria <- sapply(df[,-1], function(x) x >= -0.5 & x <= 1)

# "extract" elements that meet your criteria; result is a vector
df[,-1][meets_criteria]

If you also want to keep the row/col values associated with the element

(this follows @thelatemail's approach in the comments above):

# reshape to long
dflong <- tidyr::gather(df, dvar, dvalue, d1:d5)

# subset to meet your criteria
dflong[dflong$dvalue >= -0.5 & dflong$dvalue <= 1, ]
DanY
  • 5,920
  • 1
  • 13
  • 33
  • Thank you Dan - This works (although I should have mentioned I needed to retain the "test" row name also in the final result). The tidy solution by "thelatemail" solved the problem! – veg2020 Sep 11 '18 at 13:42
  • Thanks again Dan! – veg2020 Sep 11 '18 at 14:58