I have two datasets (that might become three or four later) containing survey data, with some identical items that were put to respondents with a 30-year gap in between. I now want to produce graphs to compare the findings. I browsed some articles on Pew Research for inspiration and I think the most suitable way to present the data is with pairs of bars of standardized height/size (one at time = t1, and below it the other at t2), with differently colored segments representing proportions. I would make them horizontal so that I have ample space for labeling each pair of bars. So it would be, or at least it could look like a common geom_bar(position="fill") + coord_flip()
graph.
Here is some sample data:
country <- c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2)
var <- c(1, 3, 2, 1, 2, 2, 3, 3, 1, 3, NA, NA)
wght <- c(0.8, 0.9, 1.2, 1.5, 0.5, 1, 0.7, 0.9, 0.8, 1.1, 1, 0.8)
df <- cbind.data.frame(country, var, wght)
df2 <- cbind.data.frame(country, var, wght)
'Country' is a country code, 'var' is the variable I'm interested in, and 'wght' would be a post-stratification weight supplied with the dataset. In this example here, the two datasets are identical of course and there's no real point in visualizing the data for comparison, but it should not make a difference for my question.
The simplest graph I would want to make is a country-specific one that contains two horizontal bars, one for the weighted responses at t1 and the other at t2. Later, I'd also want to make more complex ones, such as having in one graph pairs of bars for all countries, or within one country the responses separated by gender, age categories, education level, etc. For the most basic one, if there were no weights, I would do the following:
df$time <- 1
df2$time <- 2
varfull <- c(df$var, df2$var)
timefull <- c(df$time, df2$time)
newdf <- cbind.data.frame(timefull, varfull)
newdf$varfull <- as.factor(newdf$varfull)
ggplot(newdf, aes(time, fill=varfull)) + geom_bar(position="fill") + coord_flip()
The graph would still need to be formatted, but the general structure is there. But the data is unweighted, and I can only think of very tedious ways to get to a graph using the weights (add up each individual's weight grouped by original response values, then calculate the proportions per group of the total sum).
If anyone can help in adding the weights in an easier fashion, I'd be grateful!