1

I have a dataframe like so:

df<- data.frame(region = c("1","1","1","1","1","2","3","3","3"),
                loc = c("104","104","104","105","105","106","107", "108", "109"), 
                interact = c("A_B","A_B", "B_C", "C_D", "A_B", "E_F", "E_F", "F_G", "A_B"))

I would like to make a dataframe that:

1) counts the incidence frequency of a given interaction occurring among loc levels for each region subset. Therefore, in the example above, in region 1 there are two loc (104 and 105) that both have the interact A_B. Thus, the incidence frequency of A_B for region 1 = 2. Duplicate interact levels in the same loc are not counted. So while A_B occurs 3 times in region 1, it occurs only in two unique loc. The incidence frequency counts how many unique loc level this interact occurs in.

2) The new dataframe should vectorize all possible interact levels among all regions, and count incidences of these for each region. As a consequence, 0's should be included for all levels of interact that did not occur in that region.

3) The first row needs to be a count of unique loc levels in that region. In region1 there were 2 loc levels(104,105), region2 1 loc level(106) and in region 3, 3 loc levels(107-109).

The final output will look like:

output<- data.frame(interact = c("","A_B","B_C","C_D","E_F","F_G"),
                    region1 = c("2","2","1","0","1","0"),
                    region2 = c("1","0","0","0","1","0"),
                    region3 = c("3","1","0","0","1","1"))

I do not know where to start with this, but here is what I have adapted from @akrun in a similar question posted on Convert from long to wide format counting frequency of eliminated factor level (Prepping dataframe for input into iNEXT Online), but get errors with:

library(tidyverse)
df %>%
 group_by(region = paste0('region', region)) %>% 
        summarise(interact = "", V1 = n_distinct(loc)) %>% 
        spread(region, V1),
      df %>% 
        group_by(region = paste0('region', region) & loc),
                interact = as.character(interact)) %>%
        summarise(V1 = length(unique((interact)) %>% 
        spread(region, V1, fill = 0))
www
  • 38,575
  • 12
  • 48
  • 84
Danielle
  • 785
  • 7
  • 15

1 Answers1

1

With the clarifying comment (and re-reading the question), I'm amending my advice, but it's still using base-R methods. Try this instead:

 my_table <- with(df, table(interact, loc, region) )
 apply(my_table, c(1,3), function(x){sum(x > 0)}) 
   # 2nd arg to apply ( 1 & 3) give num of pos "loc"'s by interact and region

will give you:

        region
interact 1 2 3
     A_B 2 0 1
     B_C 1 0 0
     C_D 1 0 0
     E_F 0 1 1
     F_G 0 0 1

If you really need to relabel the region dimension, that's not particularly difficult. This is how I would proceed (assuming you assigned that value to collapse_tbl:

colnames(collapse_tbl) <- 
           paste0 ("region",  attr( collapse_tbl, 'dimnames')$region)
collapse_tbl
        region
interact region1 region2 region3
     A_B       2       0       1
     B_C       1       0       0
     C_D       1       0       0
     E_F       0       1       1
     F_G       0       0       1

That is a matrix-object, rather than a dataframe. And unlike an R 'table'-object, you can use as.data.frame if you want it in that class. The "natural" way to handle data is to use the "long" arrangement. You can still use the usual indexing with matrix (or table) objects.

> collapse_tbl["F_G", "region3"]
[1] 1

The xtabs function is often used for this purpose. Both table and xtabs are in the "original-R-verse".

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • I appreciate your help. However, your output sums the A_B interactions occurring within region 1 to give a total of 3. I need a count of the number of loc levels it was present in (which would =2). To add some context, instead of getting the total abundance of interactions in a region, I need the number of plots within a region each interaction type occurred in. – Danielle Jul 08 '17 at 04:10