0

I have a list with ~1000 entries with the following structure (small example):

example <- list(
"1" =c("car","house"), 
"2" = c("family","work","car"), 
"3" = c("house","Work","car"),
"4" = "school", 
"5" = c("Car","school"))

Most entries in the list contain only 1 string. Some contain 2, 3, 4, 5 or even more strings. I don't know the maximum of strings since I don't know how to get this information without scrolling through all ~1000 rows of the data.

I want to get a summary of the strings in my list. I want to know:

  • How many different strings there are (e.g. 5 in the small example)
  • How often the different strings occur (e.g. family:1, work:2, .... in the small example). I would like to visualize this in a plot later.
  • I don't want the analysis to be non-case-sensitive (e.g. family and Family should be treated the same)
  • I want to exclude duplicates (e.g. if one entry contains c("family","car","family"), family should be counted only 1 time)
Scasto
  • 34
  • 3

1 Answers1

1
all_strings <- tolower(unlist(example, use.names = FALSE))
#How many different string
length(unique(all_strings))
#[1] 5

#How often the different strings occur
all_string_listwise <- tolower(unlist(lapply(example, unique)))
table(all_string_listwise)

#all_string_listwise
#   car family  house school   work 
#     4      1      2      2      2 
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213