0

Sorry for asking a very simple question, I am new to R and I am trying to make a histogram from a set of data as shown in the picture. However, when I tried to make the histogram, it shows this instead. My question is will it be possible to change the x-axis label into the range from my data set? And will it be possible to do a histogram based on the ranges?

Really appreciate any help.

dc37
  • 15,840
  • 4
  • 15
  • 32
xd3262nd
  • 7
  • 1
  • 5
  • 1
    Welcome to SO ! Please, ***PLEASE*** don't put *pictures* of your data. Making a [reproducible example](https://stackoverflow.com/q/5963269/6478701) makes it much easier to help you. – RoB Dec 12 '19 at 08:03
  • Oh. Thanks for the advice, I will take note of that. – xd3262nd Dec 12 '19 at 15:42

1 Answers1

1

Your vector age is essentially a factor vector. So, instead of trying to plot the density of it, you can count for each levels of the factor and plot them as a histogram.

To do that, you can use dplyr and ggplot2:

library(dplyr)
data.frame(age) %>% group_by(age) %>% count(age) 

# A tibble: 3 x 2
# Groups:   age [3]
  age       n
  <fct> <int>
1 19-25     9
2 26-32    15
3 33-39     4

And now if you are combining this output to the ggplot2, you can get:

library(dplyr)
library(ggplot2)
data.frame(age) %>% group_by(age) %>% count(age) %>%
  ggplot(aes(x = age, y = n)) + geom_bar(stat = "identity")

And you get the following histogram: enter image description here


Using base R

On the good suggestion from @RoB, it could be interesting for you to know how to do it using R base plot.

So you can achieve it like this:

library(dplyr)
df <- data.frame(age) %>% group_by(age) %>% count(age)
barplot(df$n)
axis(side = 1, at = 1:3, labels = df$age)

enter image description here

EDIT: Alternative for base graphic plot

Actually, you can even go faster for plotting histogram of various levels of the factor age without the need of axis function or dplyr package by doing:

barplot(table(age))

enter image description here

Does it answer your question ?

Data

age = c("19-25","19-25","26-32","26-32","26-32","26-32","26-32","26-32","26-32",
        "33-39","19-25","19-25","26-32","19-25","19-25","26-32","19-25","26-32",
        "26-32","19-25","26-32","33-39","26-32","19-25","26-32","33-39","33-39","26-32")
dc37
  • 15,840
  • 4
  • 15
  • 32
  • 1
    Good answer, I admire you for taking the time to type that data ! Since OP is a bit new in R, maybe you could add a version with the base graphics histogram so they can get a good understanding of the answer ? – RoB Dec 12 '19 at 08:07
  • Haha, basically, I type few of them and then, it was just using Ctrl+C / Ctrl+V. Good suggestion for the base plot, I edited my answer for adding it (I was almost not remembering the correct command for doing it :( ) Thanks ! – dc37 Dec 12 '19 at 08:16
  • 1
    Thanks for the edit. I believe you made a typo in `library(df$age)` – RoB Dec 12 '19 at 08:17
  • Thanks !! I started to be tired I guess :D – dc37 Dec 12 '19 at 08:17
  • Yes. that helps a lot. It is really helpful and thanks for taking the time to teach me all of this! I will try to look up how the syntax works from there. Another question, how can I input the data into R? I have imported the data through csv, but how can I put it in an array? like you have shown above? – xd3262nd Dec 12 '19 at 15:52
  • You're welcome ! `ggplot2` ca be scary at first but it is a very powerful graphic tools. What do you mean by array ? I don't understand what you are talking about ? – dc37 Dec 12 '19 at 15:56
  • Yea. I am trying to learn and it is so different from other languages with a variety of packages available too. – xd3262nd Dec 12 '19 at 16:03
  • So, this is not an array but a vector (single dimension). When you import your `csv` there is good chance that it was converted as a data.frame (2 dimensions, with rows and columns). if you age as vector, you just have to write `age <- as.vector(df$Age)` with `df` is the name that you have attributed to the command reading the csv. – dc37 Dec 12 '19 at 16:06
  • I meant by this --> ` age = c("19-25","19-25","26-32","26-32","26-32","26-32","26-32","26-32","26-32", "33-39","19-25","19-25","26-32","19-25","19-25","26-32","19-25","26-32", "26-32","19-25","26-32","33-39","26-32","19-25","26-32","33-39","33-39","26-32") ` or vector like you said. But mine shows as a data.frame. Sorry for the confusion – xd3262nd Dec 12 '19 at 16:08
  • It does not make a huge difference, your dataframe is probably a 1-column with n rows. you can called it by adding $Age to the end of the name of your dataframe. There is plenty of resources to learn R on the web (Istarted with the book R for dummies) – dc37 Dec 12 '19 at 16:17
  • Gotcha. I did some googling and found out how to change them into character and put them into the example you have shown above. Thanks again for explaining the concept and suggesting the book. I will try to read up things online. – xd3262nd Dec 12 '19 at 16:24