2

I have a data frame that is fairly simple. It is a skills matrix for employees that contains user and 35-36 columns of IT skills with a ranking from 0-5. I summed each column and then sorted them DESC by skill value. Now I am looking to create a bar graph, but not sure what to put for the x value.

I have tried using colsums and colnames

Read CSV into R

skillsMatrix <- read.csv(file="skillsmatrix.csv", header=TRUE, sep=",")

colsums to find skills with highest values, sorted DESC

skills <- skillsMatrix[,names(sort(colSums(skillsMatrix[-1:-2]), decreasing = T))]
skills
library(ggplot2)
g <- ggplot(skills, aes(x= colSums(skills)), y=(colnames(skills))) + 
  geom_bar(stat = "identity", colour = "black")
g

expected results is to get a bar graph showing each skill with its value in descending order.

Actual result is this error:

Error: Aesthetics must be either length 1 or the same as the data (55): x

here is some output from str(skills) to give you an idea.

> str(skills)
'data.frame':   55 obs. of  35 variables:
 $ SQL                                         : int  4 3 2 3 3 2 3 3 3 4 ...
 $ IIS                                         : int  4 3 2 4 2 1 4 0 2 4 ...
 $ SQL.Server..SSIS..SSAS..SSRS.               : int  3 3 2 3 3 1 3 3 2 3 ...
 $ C.                                          : int  4 4 2 3 2 1 0 0 2 4 ...
 $ .Net..WCF..WPF.                             : int  4 2 1 2 2 2 0 0 2 4 ...
 $ VB..Net                                     : int  4 2 1 3 2 1 0 0 1 4 ...
 $ HTML.5                                      : int  3 4 3 2 1 1 0 2 1 2 ...
 $ Java.Script                                 : int  3 3 2 1 3 1 0 2 1 3 ...
 $ AppInsights                                 : int  1 1 1 3 2 0 3 0 0 3 ...
 $ Angular.JS                                  : int  2 3 2 2 2 0 0 2 2 2 ...
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Vegas588
  • 279
  • 5
  • 18
  • 1
    Please edit your question as shown [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – NelsonGon Apr 04 '19 at 01:26

2 Answers2

1

Aesthetics should be of same length as data. You have different dimensions for skills dataset than the ones for aesthetic. We can create a new dataframe with sum of skills for each technology sorted in descending order and then use that for plotting.

library(ggplot2)

new_df <- stack(sort(colSums(skills), decreasing = TRUE))

ggplot(new_df) + 
      aes(ind, values) + 
      geom_bar(stat = "identity")

enter image description here

data

skills <- data.frame(SQL = c(4, 3, 2, 4, 2, 3),IIS = c(5, 1, 2, 4, 5, 5), 
                     Javascript = c(1, 2,3, 4, 5, 5))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

Here is an option with tidyverse. We summarise the columns to get the sum of each column, gather into 'long' format and then plot the bar plot with geom_bar from ggplot2

library(tidyverse)
library(ggplot2)
skillsMatrix %>% 
   summarise_all(sum) %>%
   gather %>% 
   ggplot(., aes(key, value)) + 
      geom_bar(stat = "identity")

enter image description here

data

skillsMatrix <- structure(list(SQL = c(4, 3, 2, 4, 2, 3), IIS = c(5, 1, 2, 4, 
 5, 5), Javascript = c(1, 2, 3, 4, 5, 5)), class = "data.frame", row.names = c(NA, 
  -6L))
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Hi akrun, this is interesting too. skillsMatrix is slightly different to skills because in skillsMatrix there are two additional columns (ResourceName, Type). Both are currently listed as Factors. Would I change those columns to 'character' ? Error in Summary.factor(c(1L, 2L, 3L, 4L, 5L, 6L, 8L, 7L, 9L, 10L, 11L, : ‘sum’ not meaningful for factors – Vegas588 Apr 05 '19 at 10:57
  • @Vegas588 In that case, change the `summarise_all` to `summarise_if(is.numeric, sum)` and should work fine – akrun Apr 05 '19 at 12:27