1

I have 40 cognitive maps and I want to use the list of variables from each map to create a an accumulation curve where I plot the map number of the x-axis and the number of "new" variables identified on the y-axis. i.e. For the first map, all variables would be “new” and then for the second map, only the ones not identified on map 1 would be “new” and for the 3rd map, only those variables not identified on either of the 1st two maps would be "new"... so on so forth cumulatively for each of the 40 maps.

My dataframe is in wide format, with map number as rownames (1-40) and variable name as column names (F1-F144), and then a value of 1 if the variable is present in that map and a 0 if absent.

Any ideas would be helpful.

Gwyn
  • 13
  • 2
  • It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Aug 30 '22 at 14:53

1 Answers1

1

Here is a way.
which.max returns the index to the first maximum of a numeric vector. Since in each column all values prior to the first map where that variable occurs are 0, the first 1 is the first maximum. Then, coerce the maxima index vector to factor with complete levels from 1 to the number of maps/rows and table this factor. The table is a counts of new variables per map.

new_var <- apply(df1, 2, \(x) {
  i <- which.max(x)
  if(x[i] == 1) i else NA_integer_
})
new_var <- factor(new_var, labels = row.names(df1), levels = seq_len(m))
table(new_var)
#> new_var
#> map01 map02 map03 map04 map05 map06 map07 map08 map09 map10 
#>    10     2     4     0     1     0     1     0     0     0

Created on 2022-08-30 by the reprex package (v2.0.1)


Test data

set.seed(2022)
m <- 10L
n <- 20L
probs <- seq(0.1, 0.9, length = n)
df1 <- matrix(nrow = m, ncol = n)
for(i in 1:n) {
  df1[, i] <- rbinom(m, 1, prob = probs[i])
}
df1 <- as.data.frame(df1)
row.names(df1) <- sprintf("map%02d", as.integer(row.names(df1)))

Created on 2022-08-30 by the reprex package (v2.0.1)

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thanks for this and for the reproducible example. Much appreciated. However, if I understand correctly, the which.max output isn't showing us the new variables by map # (df1 has 10 maps in row names) but the table is showing a count of new variables by variable (V1 to V20) not by map (1 - 10). – Gwyn Aug 30 '22 at 16:29
  • @Gwyn You are right. Corrected, see now. – Rui Barradas Aug 30 '22 at 18:08