2

I am trying to plot a data frame with two variables by descending order. Both variables are factors. I want to consider the frequency of both variables when plotting, just like a pivot table in excel.

I tried to use tidy to group, count, and sort order of the variables by descending order.

library(tidyverse)

# Create a data frame that simulates the data that needs to be modeled

#Create data frame that will hold data for simulation
df1 = as.data.frame(replicate(2, 
                              sample(c("A", "B", "C", "D", "E","F","G","H","I","J"), 
                                     50, 
                                     rep=TRUE)))

#Replace V2 column with System Nomenclature (Simulated)
df1$V2 <- sample(1:4, replace = TRUE, nrow(df1))

#Make V2 into a Factor
df1$V2 = as.factor(df1$V2)

#Create frequency table
df2 <- df1 %>% 
  group_by(V1, V2) %>%
  summarise(counts = n()) %>%
  ungroup() %>%
  arrange(desc(counts))

#Plot the 2 variable data
ggplot(df2, 
       aes(reorder(x = V1, -counts) , 
           y = counts, 
           fill = V2)) +
 geom_bar(stat = "identity")

I expect to the graph to plot the data in descending order by the frequency of V1 but with the fill of V2. Just like the pivot table feature in excel. I also want to only display the Top-5 by frequency of V1 and fill with V2.

enter image description here

Z.Lin
  • 28,055
  • 6
  • 54
  • 94
  • @zx8754. This is not a duplicate since i wish to limit the variable on the x-axis to include only the Top-5 V1 variables with the highest count while also being filled by the V2 variable. Whereas, in the linked question includes all data. – Andrew Perez Jan 28 '19 at 23:08

1 Answers1

2

You can use fct_reorder and fct_rev to achieve what you want

#Create data frame that will hold data for simulation
df1 = as.data.frame(replicate(2, sample(c("A", "B", "C", "D", "E","F","G","H","I","J"), 50, rep=TRUE)))

#Replace V2 column with System Nomenclature (Simulated)
df1$V2 <- sample(1:4, replace = TRUE, nrow(df1))

#Make V2 into a Factor
df1$V2 = as.factor(df1$V2)

#Create frequency table
df2 <- df1 %>% group_by(V1, V2) %>%
    summarise(counts = n()) %>%
    ungroup() %>%
    arrange(desc(counts))

#Plot the 2 variable data.
##fct_reorder rearranges the factors, and fct_rev reverses the order, so it is descending[![enter image description here][1]][1]
ggplot(df2, aes(fct_rev(fct_reorder(V1, counts,fun = sum)) , y = counts, fill = V2)) +
    geom_bar(stat = "identity")

enter image description here

##Keeping only top 5
df2 %>% group_by(V1) %>%
filter(sum(counts) > 5) %>%
ggplot(aes(x = fct_rev(fct_reorder(V1,
                    counts,fun = sum)),
            y = counts, fill = V2)) +
geom_bar(stat = "identity")

enter image description here

Henry Cyranka
  • 2,970
  • 1
  • 16
  • 21
  • 1
    Thank you for the quick response. Is there a possibility to only display the Top-5? Just display H, I, G, F, and B with preserving the Fill. – Andrew Perez Jan 28 '19 at 05:41
  • Yes, but it is easier to filter then before plotting. Will edit my response – Henry Cyranka Jan 28 '19 at 10:36
  • 1
    I believe your edited code filters counts that are greater than 5. How would you filter the data to include only the top 6 V1 varaibles? (e.g.Plot of H,I,G,F,B,A) – Andrew Perez Jan 28 '19 at 20:46