I hope you can help me
I want to graph the number of publications per year (and categorize by discipline).
How can I bar graph in ggplot2 without duplicating data? .
How can I plot with a single value per ID(x)?
I can't remove the rows because my DF has other columns where the data needs to be like this for other plots.
Thank you very much.
structure(list(x = c(1240L, 1251L, 1214L, 1222L, 1234L, 1235L,
1183L, 1197L, 1198L, 1162L, 1167L, 1169L, 1170L, 1171L, 1176L,
1104L, 1104L, 1113L, 1117L, 1119L, 1119L, 1063L, 1064L, 1065L,
1066L, 1072L, 1081L), year = c(1997L, 1997L, 1998L, 1998L, 1998L,
1998L, 1999L, 1999L, 1999L, 2000L, 2000L, 2000L, 2000L, 2000L,
2000L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2003L, 2003L,
2003L, 2003L, 2003L, 2003L), discipline = structure(c(11L, 2L,
7L, 2L, 2L, 2L, 7L, 7L, 7L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 2L,
4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "Biogeochemistry",
"Conservation", "Ecology", "Environmental sciences (interdisciplines)",
"Geochemical", "Geochemistry", "Geography", "Limnology", "Management",
"Oceanography", "Socioecology"), class = "factor"), es.type = c("no",
"no", "no", "Supporting", "no", "no", "no", "no", "no", "no",
"Regulating", "no", "no", "Supporting", "Supporting", "Supporting",
"Regulating", "Supporting", "Supporting", "Supporting", "Regulating",
"no", "no", "no", "Supporting", "Supporting", "Supporting")), row.names = c(NA,
-27L), class = "data.frame")
For example, in this plot, data of ecology in 2002 are duplicated. Plot
Question 2:
What if I want to remove the duplicated data but considering two columns? For example:
ID = c(1,1,1,1,2,2,3,4,5,5,5,5,6)
Year = c(1990, 1990, 1990, 1990, 1994, 1994,1994, 1995,1995, 1995,1995,1995,1996)
Discipline <- c("Ecology","Ecology","Oceanography", "Oceanography","Oceanography","Oceanography","Oceanography","Oceanography","Oceanography",
"Oceanography","Oceanography","Microbiology","Ecology")
df <-data.frame(ID, Year, Discipline)
#Build plot
p<-ggplot(data=df, aes(x=factor(Year), fill = Discipline)) + geom_bar()
p
In this case I would like to plot two data from ID1 = Ecology and Oceanography. I mean I want to remove the duplicated disciplines inside my df$x. For ID1 I want to remove 1 row of Ecology and 1 row of Oceanography. What can I do in this case?