0
Process_Table = Process_Table[order(-Process_Table$Process, -Process_Table$Freq),]

#output
                             Process Freq Percent
17            Other Airport Services   45   15.46
5                           Check-in   35   12.03
23 Ticket sales and support channels   35   12.03
11               Flight and inflight   33   11.34
19                      Pegasus Plus   23    7.90
24                       Time Delays   16    5.50
7                              Other   13    4.47
14                             Other   13    4.47
22                             Other   13    4.47
25                             Other   13    4.47
16                             Other   11    3.78
20                             Other    6    2.06
26                             Other    6    2.06
3                              Other    5    1.72
13                             Other    5    1.72
18                             Other    5    1.72
21                             Other    4    1.37
1                              Other    2    0.69
2                              Other    1    0.34
4                              Other    1    0.34
6                              Other    1    0.34
8                              Other    1    0.34
9                              Other    1    0.34
10                             Other    1    0.34
12                             Other    1    0.34
15                             Other    1    0.34

as you can see it is giving different frequency for the same level whereas, if i am printing the levels in that feature it is giving an output as the following

levels(Process_Table$Process)

[1] "Check-in"                          "Flight and inflight"              
[3] "Other"                             "Other Airport Services"           
[5] "Pegasus Plus"                      "Ticket sales and support channels"
[7] "Time Delays"             

what i want is the combined frequency of "Others" category. Can anyone help me out on this.


Edit: code was used to derive to the first set of output:

Process_Table$Percent = round(Process_Table$Freq/sum(Process_Table$Freq) * 100, 2)

Process_Table$Process = as.character(Process_Table$Process)
low_list = Process_Table %>%
  filter(Percent < 5.50) %>%
  select(Process)

Process_Table$Process = ifelse(Process_Table$Process %in% low_list$Process, 'Other', Process_Table$Process)

as.data.frame(Process_Table)

Process_Table$Process = as.factor(Process_Table$Process)
zx8754
  • 52,746
  • 12
  • 114
  • 209
Deb
  • 17
  • 1
  • 7
  • 2
    where is your ```group_by``` in the input code? – amrrs Sep 20 '17 at 14:04
  • 2
    How did you create this data in the first place? Please edit your question to include a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data and the desired output. – MrFlick Sep 20 '17 at 14:08
  • It is an Airlines Complains dataset. The feature Process is a factor/categorical feature which has around 26 levels, i aggregated it based on its percentage of frequency and gave a condition which transformed all the categories/levels which had a frequency percentage of less than 5.5% will be allocated to 1 single level as "OTHERS". It did so but, the frequency for all the changed levels were showing separately. It is resolved now after using @Troy's code. Thank you for your concern. – Deb Sep 21 '17 at 04:42

1 Answers1

0

Your Processed_Table should undergo another step of aggregating. Add the following to your final step of data aggregating.

    Processed_Table <- Processed_Table %>% group_by(Process) %>% summarize(Freq = sum(Freq), Percent = sum(Percent))
Troy
  • 131
  • 2