3

One of the columns (new) in the dataframe below is a table.

#dput(head(df1))
structure(list(a = c(1, 2, 3, 4, 5, 7), b = c(2, 3, 3, 5, 5, 
7), c = c(1, 3, 2, 4, 5, 7), new = list(structure(2:1, .Dim = 2L, .Dimnames = structure(list(
    c("1", "2")), .Names = ""), class = "table"), structure(1:2, .Dim = 2L, .Dimnames = structure(list(
    c("2", "3")), .Names = ""), class = "table"), structure(1:2, .Dim = 2L, .Dimnames = structure(list(
    c("2", "3")), .Names = ""), class = "table"), structure(2:1, .Dim = 2L, .Dimnames = structure(list(
    c("4", "5")), .Names = ""), class = "table"), structure(c(`5` = 3L), .Dim = 1L, .Dimnames = structure(list(
    "5"), .Names = ""), class = "table"), structure(c(`7` = 3L), .Dim = 1L, .Dimnames = structure(list(
    "7"), .Names = ""), class = "table"))), row.names = c(NA, 
6L), class = "data.frame")

The new column is a result of apply(df1, 1, table). An example of the new column subsetting using df1[4, "new"][[1]] produces the following output.

df1[4, "new"][[1]]

#4 5 --> Vals
#2 1 --> Freq

I want to formulate a condition such as give me all the Vals where Freq in the new column is greater than or equal to some condition and use it to subset the new column.

Here is an example and what I have done so far.

df1[4, "new"][[1]][]>=2
#    4     5 
# TRUE FALSE 

# Subsetting using the above logical
as.integer(names(df1[4, "new"][[1]][df1[4, "new"][[1]][]>=2]))
#[1] 4

The result is what I expect. However, it is verbose, and I will be happy if there is a shorter version of it (that is not a pressing issue at the moment though I'll be grateful as well as happy to learn writing clear and concise lines).

The pressing problem I have is how to modify the condition as.integer(names(df1[4, "new"][[1]][df1[4, "new"][[1]][]>=2])) and apply it to the whole column. For example, for a condition column new == 3, 5 and 7 are the expected outputs.

I've seen similar posts here and here but didn't help figure out how to apply the subset condition to a column which is a table.

Thank you.

deepseefan
  • 3,701
  • 3
  • 18
  • 31
  • Can you clarify exactly what you want to return? If the condition is >3, you want the last two rows to be returned? Or a vector containing just 5 and 7? What if the condition is >=2? Would you return all the rows? – Calum You Sep 18 '19 at 17:36
  • Thank you. The output I want is just `5` and `7`; and when condition is `>=2`, will be specific values (`names`) from the `new` column satisfying the condition. – deepseefan Sep 19 '19 at 08:13

1 Answers1

3

Investigating the class of the object (i.e. the column) yields "list".

class(df1$new)
# [1] "list"

Usually we use e.g. the lapply() function to apply a function to elements of a list. To rather obtain vectors or matrices than lists as a result we could try sapply.

So, define your condition,

COND <- 2

and use your function in a sapply:

sapply(df1$new, function(x) as.numeric(names(x[x >= COND])))
# [1] 1 3 3 4 5 7
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • 1
    Thanks and just pointing I've to `unlist` the `sapply` to get an output similar to yours and which is exactly what I want. Thank you. – deepseefan Sep 19 '19 at 08:39