-4

So I have a list of values:

Value
AAA
BBB
CCC
.
.
.
ZZZ

Now I have a data frame where each row has 15 column that can contain these values as such:

ID V1   V2   V3   V4   V5   V6   V7   V8   V9   V10   V11   V12   V13   V14  V15
1  AAA
2  AAA  BBB
3  CCC  BBB

Basically, I'd like a count of rows from this data frame for each value in that list where it can show up in any of the 15 columns:

Desired output:

Value  Count
AAA     2
BBB     2
CCC     1
.
.
.
ZZZ     0

I've tried using sapply and apply like the following but this doesn't seem to work:

apply(mylist$values, 2, function(x) { length(which(df[,2:16] %in% x)) } )

or

sapply(mylist$values, function(x) { length(which(x %in% df[,2:16])) })

I'd appreciate any ideas!

Thanks,

ymyook
  • 23
  • 6
  • It's a bit unclear what you want and looks like you haven't made an attempt yourself. The standard on here is to provide a [reprex](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in which you give us 1) sample data, 2) desired output, 3) what you've tried already, and 4) an explanation of why this is different from the results you found through searching. – cparmstrong Dec 18 '17 at 20:38
  • sorry! editing now. – ymyook Dec 18 '17 at 20:43

2 Answers2

1

Something like this using table?

# Generate some sample data
set.seed(2017);
df <- as.data.frame(matrix(
    sapply(sample(LETTERS[1:5], 45, replace = T), function(x) paste(rep(x, 3), collapse = "")),
    ncol = 15));
df;
#   V1  V2  V3  V4  V5  V6  V7  V8  V9 V10 V11 V12 V13 V14 V15
#1 EEE BBB AAA BBB AAA BBB EEE BBB BBB BBB EEE BBB AAA CCC AAA
#2 CCC DDD CCC DDD CCC BBB EEE EEE EEE AAA AAA EEE AAA CCC AAA
#3 CCC DDD CCC AAA CCC DDD DDD DDD DDD BBB BBB DDD AAA CCC EEE

# Your list of values 
Values <- list(sapply(LETTERS[1:6], function(x) paste(rep(x, 3), collapse = "")))
Values;
#[[1]]
#    A     B     C     D     E     F
#"AAA" "BBB" "CCC" "DDD" "EEE" "FFF"

# Summarise counts as table
table(factor(unlist(df), levels = unique(unlist(Values))));

# As dataframe
df.table <- as.data.frame(table(factor(unlist(df), levels = unique(unlist(Values)))));
df.table[order(as.character(df.table$Var1)), ];
#  Var1 Freq
#1  AAA   10
#2  BBB   10
#3  CCC    9
#4  DDD    8
#5  EEE    8
#6  FFF    0

Notice the 0 count for FFF which is not part of df but shows up in Values.

Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • oh wow ok. That's ridiculously simple. I was hoping to go off of my original list of values and just add the counts right next to it because the list has other information about the values. But then I could use your approach and join the counts to it. – ymyook Dec 18 '17 at 21:22
  • @ymyook if this answer solves your issue, consider accepting it as *the* answer by clicking on the check mark to the left of the answer. – CPak Dec 18 '17 at 21:26
  • @ymyook Not sure what you mean. Which *"list of values"* are you talking about? You have a `dataframe` (or `list`) with 15 columns. If you have a more complex `dataframe` perhaps a `dplyr::group_by` plus `summarise` approach is more straightforward. – Maurits Evers Dec 18 '17 at 21:26
  • @MauritsEvers (disclaimer: I like the simplicity of your answer), but there are cases where a value is not present in `df` but present in list of relevant `Values`, in which case `table` alone won't suffice. It is necessary to `left_join` or `inner_join` to make the output of counts consistent with the *relevant list* of values – CPak Dec 18 '17 at 21:32
  • @MauritsEvers I mean my original list of values at the very beginning of my question. That is actually another data frame that serves as a master list of all the values I'm interested in. And I was just hoping to add the counts next to them. Sorry I should've been more thorough in my question but wanted to pare down to the essence to avoid complicating everything :( – ymyook Dec 18 '17 at 21:35
  • @CPak I see; in that case I would specify all factor levels from `Values`, and then still use `table`. Entries not in `df` would get a `0` count. I don't think this is any more complex. See my updated answer. – Maurits Evers Dec 18 '17 at 21:36
  • Yes, what @CPak said. – ymyook Dec 18 '17 at 21:36
  • @MauritsEvers how do I "specify all factor levels"? – ymyook Dec 18 '17 at 21:37
  • @ymyook Please see my updated example. All you need to do is use `factor(..., levels = all_unique_entries_from_Values)`. – Maurits Evers Dec 18 '17 at 21:44
  • This is perfect! Thanks – ymyook Dec 18 '17 at 21:57
  • @ymyook Great, glad to help! – Maurits Evers Dec 18 '17 at 21:58
0

This might work for you

sapply(df1$Value, function(x) sum(df2 == x, na.rm=TRUE))
# AAA BBB CCC ZZZ 
  # 2   2   1   0

Data

df1 <- structure(list(Value = c("AAA", "BBB", "CCC", "ZZZ")), .Names = "Value", class = "data.frame", row.names = c(NA, 
-4L))

df2 <- structure(list(ID = 1:3, V1 = c("AAA", "AAA", "CCC"), V2 = c(NA, 
"BBB", "BBB"), V3 = c(NA, NA, NA), V4 = c(NA, NA, NA), V5 = c(NA, 
NA, NA), V6 = c(NA, NA, NA), V7 = c(NA, NA, NA), V8 = c(NA, NA, 
NA), V9 = c(NA, NA, NA), V10 = c(NA, NA, NA), V11 = c(NA, NA, 
NA), V12 = c(NA, NA, NA), V13 = c(NA, NA, NA), V14 = c(NA, NA, 
NA), V15 = c(NA, NA, NA)), class = "data.frame", .Names = c("ID", 
"V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", 
"V11", "V12", "V13", "V14", "V15"), row.names = c(NA, -3L))
CPak
  • 13,260
  • 3
  • 30
  • 48