Characters counting and subletting specific patterns

Question

I have a list of data.frames (d) that looks like this:

$ 1  :'data.frame':   1 obs. of  2 variables:     
..$ index: int 2

..$ V1 : Factor w/ 125 levels "cgtsloqasmlkjybjlo,..:"

  $ 2  :'data.frame': 1 obs. of  2 variables:  
..$ index: int 2
 ..$ V1   : Factor w/ 125 levels "ponlohlofdctlo,..:"

and so on for 1000 data.frames. I have to count the number of unique letters occurring in "cgtsloqasmlkjybjlo,..:" as well as in "ponlohlofdctlo,..:" and in the other 1000 data.frames. I tried a stupid function, but I'm not an expert so it is wrong also because it does not work:

Anyway I tried to split (but it does not work..):

 chars = sapply(d, function(x) strsplit(as.character(d),""))

In addiction, I have to count the number of occurrences of "lo" in "cgtsloqasmlkjybjlo,..:" as well as in "ponlohlofdctlo,..:" and in the other 1000.

Edit: the desired output will be a data.frame:

        Seq           length(unique_letters)   lo_occurrences
 cgtsloqasmlkjybjlo           13                       2      
   ponlohlofdctlo             9                        3     
   ..............           ............         ............    


 dput output: 
  dput(d[1:3])
structure(list(1 = structure(1000L, .Label = c("jhgfilsouilohgucaksfiaaknajdauloadbayrzjdhad", "fjkhqurtglowqgbdahhmolovdethabvfdalo", "....", "V1"), class = "factor")), .Names = c("1", "2", "3"))

It is hard (for me, at least) to understand exactly what you are looking for. Could you show an abbreviated form of your list of data frames using `dput()` etc, as described in [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — Simon, Nov 04 '13 at 22:07
I tried, but it seems to be impossible due to the dimension of the list. R crashed.. — Fuv8, Nov 04 '13 at 22:10
Can you make up a simple & contrived example, perhaps w/ just 2 data frames, & the output you would want to get from the example? — gung - Reinstate Monica, Nov 04 '13 at 22:20
Just the first (say) 3 items of the list would be enough. Try `dput(d[1:3])`. If the data frames themselves are big, you might need to create a sample list that contains shortened versions of those data frames. — Simon, Nov 04 '13 at 22:23

score 1 · Accepted Answer · answered Nov 04 '13 at 22:58

A way is this:

#simulating your list; I got an error trying to use your dput
d <- list(data.frame(index = 2, V1 = "cgtsloqasmlkjybjlo"), 
      data.frame(index = 2, V1 = "ponlohlofdctlo"))
d
#[[1]]
#  index                 V1
#1     2 cgtsloqasmlkjybjlo

#[[2]]
#  index             V1
#1     2 ponlohlofdctlo

res <- do.call(rbind, lapply(d, function(x) data.frame(seq = as.character(x$V1), 
       length_uniques = length(unique(unlist(strsplit(as.character(x$V1), "")))), 
               lo_counts = length(unlist(gregexpr("lo", as.character(x$V1)))))))
res
#                 seq length_uniques lo_counts
#1 cgtsloqasmlkjybjlo             13         2
#2     ponlohlofdctlo              9         3

Characters counting and subletting specific patterns

1 Answers1