1

The following code in R uses a for-loop. What is a way I could solve the same problem without a for-loop (maybe by vectorizing it)?

I am looking at an unfamiliar dataset with many columns (243), and am trying to figure out which columns hold unstructured text. As a first check, I was going to flag columns that are 1) of class 'character' and 2) have at least ten unique values.

openEnded <- rep(x = NA, times = ncol(scaryData))
for(i in 1:ncol(scaryData)) {
  openEnded[i] <- is.character(scaryData[[i]]) & length(unique(scaryData[[i]])) >= 10
  }
  • 1
    You may well be able to avoid loops, and vectorise this, but really need to see a small example of your data. Can you share, for example, `dput(scaryData[1:5])` please – user20650 Jun 03 '16 at 22:30

1 Answers1

1

This would probably do the job:

openEnded <- apply(scaryData, 2, function(x) is.character(x) & length(unique(x))>=10)

From the loop, you simply iterate over columns (that's the apply(scaryData, 2) part) an anonymous function that combines your two conditions (function(x) cond1 & cond2).

I guess your data is a data.frame so sapply(scaryData, 2, function(x) ...) would also work.

A nice post about the *apply family can be found there.

Vincent Bonhomme
  • 7,235
  • 2
  • 27
  • 38
  • 2
    Thank you! This would not be considered "vectorizing" the operation, since it does not explicitly use matrix algebra, correct? – Measure Theory Penguin Jun 03 '16 at 21:44
  • @JohnFogg No, it is not vectorized. [This post](http://stackoverflow.com/questions/2275896/is-rs-apply-family-more-than-syntactic-sugar/2276001#2276001) could be of interest. – RHertel Jun 04 '16 at 03:19