0

Provided the following dataframe (see below) which was taken out of a questionnaire asking about perceived security to people from different neighborhoods, I have managed to create a bar plot which displays perceived security and groups results per each neighborhood:

questionnaire_raw = read.csv("https://www.dropbox.com/s/l647q2omffnwyrg/local.data.csv?dl=0")

ggplot(data = questionnaire_raw, 
       aes(x = factor(Seguridad.de.tu.barrio..de.día.), # We have to convert x values to categorical data
           y = (..count..)/sum(..count..)*100,
           fill = neighborhoods)) + 
  geom_bar(position="dodge") + 
  ggtitle("Seguridad de día") + 
  labs(x="Grado de seguridad", y="% encuestados", fill="Barrios")

enter image description here

I would like to overlay these results with a line graph representing the mean of each security category (1, 2, 3 or 4) in all neighborhoods (this is, without grouping results), so it is easy to know if a specific neighborhood is over or under the average of all neighborhoods. However, since it's my first job with R, I do not know how to calculate that mean with a dataframe and then overlay it in the previous barplot.

ccamara
  • 1,141
  • 1
  • 12
  • 32
  • What about adding something like `+ stat_summary(fun.data="mean_cl_normal", geom = "line", mapping = aes(group = 1))` (untested)? – lukeA Feb 12 '15 at 11:56
  • results in `Error: stat_summary requires the following missing aesthetics: y` – Rentrop Feb 12 '15 at 12:00

1 Answers1

4

using data.table for data-manipulation and lukeA's comment:

require(ggplot2)
require(data.table)
setDT(questionnaire_raw)
setnames(questionnaire_raw, c("Timestamp", "Barrios", "Grado"))

plot_data <- questionnaire_raw[,.N, by=.(Barrios,Grado)]
ggplot(plot_data, aes(x=factor(Grado), y = N, fill = Barrios)) +
  geom_bar(position="dodge", stat="identity") +
  stat_summary(fun.y=mean, geom = "line", mapping = aes(group = 1)) +
  ggtitle("Seguridad de día") + 
  labs(x="Grado de seguridad", y="% encuestados", fill="Barrios")

Result: enter image description here

Rentrop
  • 20,979
  • 10
  • 72
  • 100
  • Thank you very much for your answer. It's working fine, although I have to understand what are you doing because since the original dataframe is far bigger (we have 72 variables, not 3) it seems that I can't reproduce the setnames line. I think I need to create a vector with all 72 variables, but since I have never heard about that function I am not sure. I will try creating a new dataframe with just the variables I need. – ccamara Feb 12 '15 at 15:29
  • 1
    The 'setnames' line just Alters the Column names of the Data. Have a Look at the Data before and after. It is not difficult. – Rentrop Feb 12 '15 at 15:41
  • I am re-reading your code, and honestly (and shamely) I do not understand almost anything you do on it. I still have to learn a lot about R... – ccamara Feb 12 '15 at 15:41
  • 1
    And the line with `by` counts the occurrences – Rentrop Feb 12 '15 at 15:43