0

I wish to count the different number of 'Type' for each 'Name' in the following data frame.

So far, I use a loop, which is supposedly a bad R coding habit. Do you have any idea how to improve the code?

library(data.table) # for function 'as.data.table'
library(dplyr) # for function 'n_distinct'

original = data.frame(Name = c(rep(1,10),rep(2,10),rep(3,10)),
                      Type = c(1,2,1,3,1,2,1,2,3,1,4,5,4,5,4,5,4,5,4,5,6,7,8,9,6,7,8,9,6,9))

I need this data frame with only the names in order to put in all relevant information derived from the data.

# creates a data table containing only one row per Name
onerow <- as.data.table(original) # from library 'data.table'
onerow <- unique(onerow, by = "Name")

# now transform 'onerow' to data frame and retain the column of interest ("Name")
onerow <- as.data.frame(onerow)
onerow <- as.data.frame(onerow[, 1])
names(onerow) <- "Name"

The loops aims at counting the number of types for each name. My real data set will have over 60 individuals (with about 300 rows for each individual, each row being a recorded Type), and the counts of different Types would range between 5 and 13.

# ugly loop to determine for each "Name" the count of different "Type"
for (i in 1:max(original$Name)){
  ssp <- assign(paste("SSP_", i, sep = ""), original[original$Name == i, ])
  # 'n_distinct' is from library 'dplyr', equivalent to length(unique(ssp$Type)), but faster
  cou <- assign(paste("count_", i, sep = ""), n_distinct(ssp$Type))
  onerow[i, 2] <- cou
}

names(onerow) <- c("Name","Count")                      

Additional question: how to avoid creating the 'count_i', 'cou', and 'ssp' variables in the Global Environment?

0 Answers0