2

I have a dataframe with many factors and want to create statistical tables that show the distribution for each factor, including factor levels with zero observations. For instance, these data:

structure(list(engag11 = structure(c(5L, 4L, 4L), .Label = c("Strongly Disagree", "Disagree", "Neither A or D", "Agree", "Strongly Agree"), class = "factor"), encor11 = structure(c(1L, 1L, 1L), .Label = c("Agree", "Neither Agree or Disagree", "Strongly Agree"), class = "factor"), know11 = structure(c(3L, 
1L, 1L), .Label = c("Agree", "Neither Agree or Disagree", "Strongly Agree"), class = "factor")), .Names = c("engag11", "encor11", "know11"), row.names = c(NA, 3L), class = "data.frame")

show 6 rows, but only some of the factor levels are observed for each column. When I produce a table, I'd like to display not only counts for the levels observed, but also levels NOT observed (such as "Strongly Disagree"). Like this:

# define the factor and levels
library(dplyr);library(pander);library(forcats)
eLevels<-factor(c(1,2,3,4,5), levels=1:5, labels=c("Strongly    Disagree","Disagree","Neither A or D","Agree","Strongly Agree"),ordered =TRUE )

# apply the factor to one variable
csc2$engag11<-factor(csc2$engag11,eLevels)

t1<-table(csc2$engag11)
pander(t1)

Which results in a frequency table that shows counts for each level, including zeroes for levels not reported / observed.

But I have dozens of variables to convert. A simple lapply function recommended on Stackoverflow doesn't seem to work, such as this one:

csc2[1:3]<-lapply(csc[1:3],eLevels)

I also tried a simple function (n=list of columns) for this, but failed:

facConv<-function(df,n)
{   df$n<-factor(c(1,2,3,4,5), levels=1:5, labels=c("Strongly 
Disagree","Disagree","Neither A or D","Agree","Strongly Agree") )
return(result)   }

Can someone offer a solution?

Ben
  • 1,113
  • 10
  • 26

2 Answers2

3

An lapply should work fine, you just need to specify the factor() function:

csc2[1:3] <- lapply(csc2[1:3], function(x) factor(x, eLevels))

Then you can call table like:

table(csc2[1])

#Strongly    Disagree             Disagree       Neither A or D                Agree       Strongly Agree 
#                   0                    0                    0                    2                    1 
table(csc2[2])

#Strongly    Disagree             Disagree       Neither A or D                Agree       Strongly Agree 
#                   0                    0                    0                    3                    0 
Mike H.
  • 13,960
  • 2
  • 29
  • 39
  • Thanks Mike! So, in your solution, is it true that you created a function that takes as its input the columns specified as [1:3] and then passes each column through the factor command ? – Ben Jan 11 '18 at 14:37
  • Yup! It basically loops through each column and applies the `factor()` function – Mike H. Jan 11 '18 at 14:43
  • OK, actually, I have an error. My command is like this: csc[1:4]<-lapply(csc[1:4,10:17,20:44], function(x) factor(x, eLevels)) and then I get this error: the condition has length > 1 and only the first element will be used Show Traceback Error in `[<-.data.frame`(`*tmp*`, 1:4, value = list(CMNT1 = c(NA_integer_, : replacement element 1 has 4 rows, need 15 – Ben Jan 11 '18 at 14:56
  • What are you trying to do? Select columns `1:4, 10:17, 20:44` and factor them? – Mike H. Jan 11 '18 at 14:58
  • try: `csc[c(1:4,10:17,20:44)]<-lapply(csc[c(1:4,10:17,20:44)], function(x) factor(x, eLevels))` – Mike H. Jan 11 '18 at 17:56
0

The inelegant quick and dirty way is to use for loop:

df <- data.frame(A = c("A", "A", "B"),
                 B = c("A", "C", "A"),
                 C = c("A", "A", "D"))
lvl <- c("A", "B", "C", "D", "E")

for (i in 1:ncol(df)) {
  df[,i] <- factor(df[,i], levels=lvl)
}

table(df$A)

And if your original data is numbers then:

df <- data.frame(A = c(1,1,2),
                 B = c(1,3,1),
                 C = c(1,1,4))
lvl <- c("A", "B", "C", "D", "E")
for (i in 1:ncol(df)) {
  df[,i] <- factor(df[,i], levels=1:5, labels=lvl)
}
df
table(df$A)
Ben Toh
  • 742
  • 5
  • 9