Avoid duplicate data in ggplot2

Question

I hope you can help me

I want to graph the number of publications per year (and categorize by discipline).

How can I bar graph in ggplot2 without duplicating data? .

How can I plot with a single value per ID(x)?

I can't remove the rows because my DF has other columns where the data needs to be like this for other plots.

Thank you very much.

structure(list(x = c(1240L, 1251L, 1214L, 1222L, 1234L, 1235L, 
1183L, 1197L, 1198L, 1162L, 1167L, 1169L, 1170L, 1171L, 1176L, 
1104L, 1104L, 1113L, 1117L, 1119L, 1119L, 1063L, 1064L, 1065L, 
1066L, 1072L, 1081L), year = c(1997L, 1997L, 1998L, 1998L, 1998L, 
1998L, 1999L, 1999L, 1999L, 2000L, 2000L, 2000L, 2000L, 2000L, 
2000L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2003L, 2003L, 
2003L, 2003L, 2003L, 2003L), discipline = structure(c(11L, 2L, 
7L, 2L, 2L, 2L, 7L, 7L, 7L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 2L, 
4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("", "Biogeochemistry", 
"Conservation", "Ecology", "Environmental sciences (interdisciplines)", 
"Geochemical", "Geochemistry", "Geography", "Limnology", "Management", 
"Oceanography", "Socioecology"), class = "factor"), es.type = c("no", 
"no", "no", "Supporting", "no", "no", "no", "no", "no", "no", 
"Regulating", "no", "no", "Supporting", "Supporting", "Supporting", 
"Regulating", "Supporting", "Supporting", "Supporting", "Regulating", 
"no", "no", "no", "Supporting", "Supporting", "Supporting")), row.names = c(NA, 
-27L), class = "data.frame")

For example, in this plot, data of ecology in 2002 are duplicated. Plot

Question 2:

What if I want to remove the duplicated data but considering two columns? For example:

ID = c(1,1,1,1,2,2,3,4,5,5,5,5,6)
Year = c(1990, 1990, 1990, 1990, 1994, 1994,1994, 1995,1995, 1995,1995,1995,1996)
Discipline <- c("Ecology","Ecology","Oceanography", "Oceanography","Oceanography","Oceanography","Oceanography","Oceanography","Oceanography",
                                 "Oceanography","Oceanography","Microbiology","Ecology")
df <-data.frame(ID, Year, Discipline)

 #Build plot
p<-ggplot(data=df, aes(x=factor(Year), fill = Discipline)) + geom_bar()
p

In this case I would like to plot two data from ID1 = Ecology and Oceanography. I mean I want to remove the duplicated disciplines inside my df$x. For ID1 I want to remove 1 row of Ecology and 1 row of Oceanography. What can I do in this case?

Hi Matias, can you share your dataframe? Try `dput(your_data_frame)` and copy your results from the console and paste them under your question. — TheSciGuy, Oct 19 '20 at 17:26
Hi, we don't have enough information to answer your question. Please share what code you have tried so far, and include enough of your data (or other example data) to reproduce your problem. See also here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. Otherwise, it is very unlikely you will receive a useful answer. — Axeman, Oct 19 '20 at 17:37

Qwethm · Answer 1 · 2020-10-19T19:23:47.380

You might be looking for something like this:

#Define data:
df = structure(list(x = c(1240L, 1251L, 1214L, 1222L, 1234L, 1235L, 
                     1183L, 1197L, 1198L, 1162L, 1167L, 1169L, 
                     1170L, 1171L, 1176L, 
                     1104L, 1104L, 1113L, 1117L, 1119L, 1119L, 1063L, 1064L, 
                     1065L, 
                     1066L, 1072L, 1081L), 
               year = c(1997L, 1997L, 1998L, 1998L, 1998L, 
                        1998L, 1999L, 1999L, 1999L, 2000L, 2000L, 2000L, 2000L, 2000L, 
                        2000L, 2002L, 2002L, 2002L, 2002L, 2002L, 2002L, 2003L, 2003L, 
                        2003L, 2003L, 2003L, 2003L), 
               discipline = structure(c(11L, 2L, 7L, 2L, 2L, 2L, 7L, 7L, 7L, 2L,
                                        2L, 2L, 2L, 2L, 4L, 4L, 4L, 2L, 
                                          4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 2L), 
              .Label = c("", "Biogeochemistry", 
                  "Conservation", "Ecology", "Environmental sciences (interdisciplines)", 
                  "Geochemical", "Geochemistry", "Geography", "Limnology", "Management", 
                  "Oceanography", "Socioecology"), class = "factor"), 
              es.type = c("no", "no", "no", "Supporting", "no", "no", "no", "no", "no", "no", "Regulating", "no", "no", "Supporting", "Supporting", "Supporting", 
"Regulating", "Supporting", "Supporting", "Supporting", "Regulating", 
"no", "no", "no", "Supporting", "Supporting", "Supporting")),row.names = c(NA, 
                     -27L), class = "data.frame")   


#Build plot:
p<-ggplot(data=df[!duplicated(df$x),] , aes(x=factor(year), fill = discipline)) +
  geom_bar(position = position_dodge())
p

The essential part is the df[!duplicated(df$x),], which gives you only the rows of df, where the value in the x-column is unique.

Regarding your second question, you can do:

p<-ggplot(data=df[!duplicated(df[,c("ID", "Discipline")]),], aes(x=factor(Year), 
          fill = Discipline)) + 
  geom_bar(position = position_dodge())
p

Effectively, this calls duplicated on the wanted columns.

Yes! Thanks very much!!!. I was looking for a function related to unique or stat_unique but it looks more easy. — Matías, Oct 19 '20 at 18:14
@Matias You're welcome. The ```duplicated```-function is often quite useful for this type of applications. For the future, you should add the code for your plot too. I expect it looks a lot like my code, and I would then have been able to quickly rewrite it, instead of writing my answer from scratch. :-) — Qwethm, Oct 19 '20 at 18:18
I have an aditiona question, but I posted as another answer (too large). What can I do with this case? Thanks! — Matías, Oct 19 '20 at 19:00
@Matías I've edited this answer with and answer for your second question. In my opinion, you should delete your "answer" and copy it into your original question instead. It is confusing to have a question as an answer :-) — Qwethm, Oct 19 '20 at 19:25
Thanks for everything! And I deleted edited the original post to avoid confusion. c: — Matías, Oct 19 '20 at 19:44
@Matías Super. If you found my answer helpful, remember to mark it as "answered" by accepting my answer. — Qwethm, Oct 20 '20 at 09:11

Avoid duplicate data in ggplot2

1 Answers1