R create topcount of data frame based on one variable

Question

I have a data frame with a number of columns, where one of these is error code. Along with the error code, there is a severity code (A to E). I want to create a matrix with the top 10 frequent error codes, alongside the severity code (And possible other variables). How can I do this?

Input:

| Error code | Severity code | Description
    1              A
    2              A
    1              A
    3              B
    3              B
    1              A

Expected output:

 | Error code | Severity code | Description | Frequency
       1             A                            3
       3             B                            2
       2             A                            1

Please show a small reproducible example with expected output — akrun, Apr 18 '16 at 13:08
See [this](http://stackoverflow.com/questions/10879551/frequency-count-of-two-column-in-r) for instance... — David Arenburg, Apr 18 '16 at 13:52
Please provide more information as suggested above. For example, it is not clear if the Description field will be the same across every instance of error code == 1. — lmo, Apr 18 '16 at 13:56

Kunal Puri · Answer 1 · 2016-04-18T14:21:46.250

2

It can be done in absolutely no time using data.table.

Assumption: The data.frame is saved in variable df with column names Error_Code and Severity_Code

library(data.table)

## converts data.frame to data.table
setDT(df)

## The only line you have to write
df[,.N,by=c('Error_Code','Severity_Code')]

##   Error_Code Severity_Code N
##1:          1             A 3
##2:          2             A 1
##3:          3             B 2

edited Apr 18 '16 at 14:21

answered Apr 18 '16 at 14:19

Kunal Puri

3,419
1
10
22

1

`setDT` updates it's input, you don't need to assign it to another variable with `<-`. – jangorecki Apr 18 '16 at 14:21
@jangorecki Thanks a lot for that enlightenment! – Kunal Puri Apr 18 '16 at 14:25
1

This is valid also for other `set*` function in data.table. All of them don't need to be assigned to new variable as they updates its input. This is done for efficiency, which can be easier to observe when working with bigger data. [Reference semantics vignette](https://rawgit.com/wiki/Rdatatable/data.table/vignettes/datatable-reference-semantics.html) is good resource on that. – jangorecki Apr 18 '16 at 14:29

score 1 · Answer 2 · edited Jun 20 '20 at 09:12

your data:

Error_code <- c(1,2,1,3,3,1)
LL <- data.frame(Error_code,Severity_code,stringsAsFactors=F)
Severity_code <- c("A","A","A","B","B","A")

The solutions,codes like those ,you can install this packages "plyr",then ues the function "count"

install.packages("plyr") 
library(plyr)
Freq_table  <- count(LL,vars=c("Error_code","Severity_code"))
colnames(Freq_table) <- c("Error code","Severity code","Frequency")

the result

 Freq_table

R create topcount of data frame based on one variable

2 Answers2

your data:

The solutions,codes like those ,you can install this packages "plyr",then ues the function "count"

the result