7

I want to create a frequency table from a data frame and save it in excel. Using table() function i can only create frequency of a particular column. But I want to create frequency table for all the columns altogether, and for each column the levels or type of variables may differ too. Like kind of summary of a data frame but there will not be mean or other measures, only frequencies. I was trying something like this

 for(i in 1:230){
 rm(tb)
 tb<-data.frame(table(mydata[i]))
 tb2<-cbind(tb2,tb)
 }

But it's showing the following Error

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 15, 12

In place of cbind() I also used data.frame() but the Error didn't changed.

Vishwa
  • 1,112
  • 1
  • 11
  • 23
  • 1
    what about `lapply(mydata,table)`? – scoa Aug 26 '15 at 07:17
  • The error means that tb2 contains 15 rows and tb contains 12 rows. – Verena Haunschmid Aug 26 '15 at 07:54
  • I just tried `lapply(mydata,table)` while trying to save it in a csv file using write.csv, it produces an Error `Error in data.frame(s_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, : arguments imply differing number of rows: 24066, 27558, 17779, 18466, 3, 15, 5, 12, 13, 10, 4, 9, 7, 2, 21, 20, 51, 14, 23, 24, 31, 6, 26, 8, 55, 576, 1027, 2459, 363, 1973, 765`. Same error while I was trying to use `data.frame()` as well over the result too. – Sankar Narayan Misra Aug 26 '15 at 08:46
  • As you say, levels of variables might differ. How do you expect to create a data frame with frequencies for all variables? You might be able to do something like min, max, mean, frequency as all your columns will produce 3 values no matter what. – AntoniosK Aug 26 '15 at 09:28

3 Answers3

8

You are getting an error because you are trying to combine the data frames that have different dimensions. From what I understand, your problem is two-fold: (1) you want to get the frequency distribution of each column regardless of type; and, (2) you want to save all of the results in a single Excel sheet.

For the first problem, you can use the mapply() function.

set.seed(1)

dat <- data.frame(
  x = sample(LETTERS[1:5], 15, replace = TRUE),
  y = rbinom(5, 15, prob = 0.4)
)

mylist <- mapply(table, dat); mylist

# $x
# 
# A B C D E 
# 2 5 1 4 3 
# 
# $y
# 
# 5  6  7 11 
# 3  3  6  3 

You can also use purrr::map().

library(purrr)
dat %>% map(table)

The second problem has several solutions in this question: Export a list into a CSV or TXT file in R. In particular, LyzandeR's answer will enable you to do just what you intended. If you prefer to save the outputs in separate files, you can do:

mapply(write.csv, mylist, file=paste0(names(mylist), '.csv'))
hpesoj626
  • 3,529
  • 1
  • 17
  • 25
6

Maybe an rbind solution is better as it allows you to handle variables with different levels:

dt = data.frame(x = c("A","A","B","C"),
                y = c(1,1,2,1))

dt

#   x y
# 1 A 1
# 2 A 1
# 3 B 2
# 4 C 1

dt_res = data.frame()

for (i in 1:ncol(dt)){

dt_temp = data.frame(t(table(dt[,i])))
dt_temp$Var1 = names(dt)[i]

dt_res = rbind(dt_res, dt_temp)

}

names(dt_res) = c("Variable","Levels","Freq")

dt_res

#   Variable Levels Freq
# 1        x      A    2
# 2        x      B    1
# 3        x      C    1
# 4        y      1    3
# 5        y      2    1

And an alternative (probably faster) process using apply:

dt = data.frame(x = c("A","A","B","C"),
                y = c(1,1,2,1))

dt

ff = function(x){

  y = data.frame(t(table(x)))
  y$Var1 = NULL
  names(y) = c("Levels","Freq")
  return(y)
}

dd = do.call(rbind, apply(dt, 2, ff)) 

dd

#     Levels Freq
# x.1      A    2
# x.2      B    1
# x.3      C    1
# y.1      1    3
# y.2      2    1


# extract variable names from row names
dd$Variable = sapply(row.names(dd), function(x) unlist(strsplit(x,"[.]"))[1])

dd

#     Levels Freq Variable
# x.1      A    2        x
# x.2      B    1        x
# x.3      C    1        x
# y.1      1    3        y
# y.2      2    1        y
AntoniosK
  • 15,991
  • 2
  • 19
  • 32
2

Edit (2021-03-29): tidyverse Principles

Here is some updated code that utilizes tidyverse, specifically functions from dplyr, tibble, and purrr. The code is a bit more readable and easier to carry out as well. Example data set is provided.

tibble(
    a = rep(c(1:3), 2),
    b = factor(rep(c("Jan", "Feb", "Mar"), 2)),
    c = factor(rep(LETTERS[1:3], 2))
    ) -> 
    dat

dat #print df

# A tibble: 6 x 3
    a    b    c    
  <int> <fct> <fct>
1     1 Jan   A    
2     2 Feb   B    
3     3 Mar   C    
4     1 Jan   A    
5     2 Feb   B    
6     3 Mar   C

Get counts and proportions across columns.

library(purrr)
library(dplyr)
library(tibble)
#library(tidyverse) #to load assortment of pkgs

#output tables - I like to use parentheses & specifying my funs
purrr::map(
    dat, function(.x) {
        count(tibble(x = .x), x) %>% 
            mutate(pct = (n / sum(n) * 100)) 
        })

#here is the same code but more concise (tidy eval)
purrr::map(dat, ~ count(tibble(x = .x), x) %>% 
               mutate(pct = (n / sum(n) * 100)))

$a
# A tibble: 6 x 3
      x     n   pct
  <int> <int> <dbl>
1     1     1  16.7
2     2     1  16.7
3     3     1  16.7
4     4     1  16.7
5     5     1  16.7
6     6     1  16.7

$b
# A tibble: 3 x 3
  x         n   pct
  <fct> <int> <dbl>
1 Feb       2  33.3
2 Jan       2  33.3
3 Mar       2  33.3

$c
# A tibble: 2 x 3
  x         n   pct
  <fct> <int> <dbl>
1 A         3    50
2 B         3    50

Old code...

The table() function returns a "table" object, which is nigh impossible to manipulate using R in my experience. I tend to just write my own function to circumvent this issue. Let's first create a data frame with some categorical variables/features (wide formatted data).

We can use lapply() in conjunction with the table() function found in base R to create a list of frequency counts for each feature.

freqList = lapply(select_if(dat, is.factor), 
              function(x) {
                  df = data.frame(table(x))
                  names(df) = c("x", "y")
                  
                  return(df) 
                    }
                )

This approach allows each list object to be easily indexed and further manipulated if necessary, which can be really handy with data frames containing a lot of features. Use print(freqList) to view all of the frequency tables.

DataSci-IOPsy
  • 308
  • 2
  • 11
  • If proportion tables (i.e., percent of total observations) is the goal for multiple columns then try my post [here](https://stackoverflow.com/questions/32220202/how-to-get-a-frequency-table-of-all-columns-of-complete-data-frame-in-r/59169446#59169446). – DataSci-IOPsy Dec 04 '19 at 05:53