1

I'd like to tabulate the frequencies of each unitary element in a character vector. This is vector contains the answers to a set of items in a survey, with this structure "ADCDAB...", being "A" the answer to the first item, "D" to the second one, etc. I'd like to process the data with purrr::map combined with base string functions.

p1 <- strsplit(substr(test$answer),"")
map(p1,table)

However, if I include the code with dplyr, the systems returns an error message:

 test %>% 
 mutate(p1=strsplit(answer,"")) %>% 
 map(p1,table) 

the system returns the following error message:

Error: Index 1 must have length 1, not 10

What's wrong with the second syntax?

A dummy dataset

structure(list(answer = c(".BBCBD.A.D", "...DB..AA.", "B......AB.", 
"BDDDBACADD", "BB.ABC.AAD"), d.n.i = c(1, 2, 3, 4, 5)), row.names = c(NA, 
5L), class = "data.frame")
  • Sure! The structure of the dataset is like this example: id answers 1 .BBCBD.A.DDD.BCAAD...CC.ADD.BC 2 ...DB..AA..D.BDACD.A.C..CDDBBC 3 B......AB.BD..........C..DDBB 4 BDDDBACADDDDBDDCC.ADACCACDCB.C 5 BB.ABC.AADDDBBCDDD...CB..DDB.C I want to tabulate, for each position, the answers. For example, for the first position, the table is: . 2 A 0 B 3 C 0 D 0 For the second item, the table is: . 2 A 0 B 2 C 0 D 2 And so on, until reaching the 30rd item – Juan_Ramon Lacalle Mar 14 '20 at 10:44
  • Sorry, I'm not allowed to edit the previous comment, using markdown – Juan_Ramon Lacalle Mar 14 '20 at 10:55

2 Answers2

4

Here is a base R option

x <- "ADCDAB"

out <- table(utf8ToInt(x))
names(out) <- intToUtf8(names(out), multiple = TRUE)
out
#A B C D 
#2 1 1 2

With multiple elements use lapply

x <- c("ADCDAB", "EFG")

f <- function(i) {
      out <- table(utf8ToInt(i))
      names(out) <- intToUtf8(names(out), multiple = TRUE)
      out
  }

lapply(x, f)

Returns

#[[1]]
#A B C D 
#2 1 1 2 

#[[2]]
#E F G 
#1 1 1 

If you need output as single table, try

x <- c("ADCDAB", "EFGAA")
f(paste(x, collapse = ""))
#A B C D E F G 
#4 1 1 2 1 1 1

.. or as dataframe

as.data.frame(f(paste(x, collapse = "")))
#  Var1 Freq
#1    A    4
#2    B    1
#3    C    1
#4    D    2
#5    E    1
#6    F    1
#7    G    1
markus
  • 25,843
  • 5
  • 39
  • 58
  • Thanks!!! However, your solution can't be applied when the character vector is a variable in a data frame. How to proceed then? – Juan_Ramon Lacalle Mar 14 '20 at 11:04
  • @Juan_RamonLacalle Please share dummy data in your question so we can test. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – markus Mar 14 '20 at 11:06
  • |id | answers| | --- | ---| |1 | .BBCBD.A.DDD.BCAAD...CC.ADD.BC | | 2 | ...DB..AA..D.BDACD.A.C..CDDBBC | | 3 | B......AB.BD..........C..DDBB | | 4 | BDDDBACADDDDBDDCC.ADACCACDCB.C | | 5 | BB.ABC.AADDDBBCDDD...CB..DDB.C | – Juan_Ramon Lacalle Mar 14 '20 at 11:11
  • I'v just added a dummy dataset at the end of my post – Juan_Ramon Lacalle Mar 14 '20 at 12:50
1

You could do :

library(tidyverse)
test %>% mutate(p1 = strsplit(answer,""), p2 = map(p1, table))

However, I would suggest something like below :

test %>% 
   mutate(p1 = strsplit(answer,"")) %>%
   unnest(p1) %>%
   count(answer, p1)

#  answer p1        n
#  <chr>  <chr> <int>
#1 ABCD   A         1
#2 ABCD   B         1
#3 ABCD   C         1
#4 ABCD   D         1
#5 ADCDAB A         2
#6 ADCDAB B         1
#7 ADCDAB C         1
#8 ADCDAB D         2

data

test <- data.frame(answer = c("ADCDAB", "ABCD"), stringsAsFactors = FALSE)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213