-1

I have an assignment for my course and I need to make a ggplot with the txhousing data set but it doesn't work out for me, I keep on getting errors or no outcome. This is the exercise:

This is a scatterplot of sales and month

  1. Insert a new r chunk that makes this plot

  2. Use the function ggplot() (check the help file for this function)

  3. As data argument use na.omit(txhousing)

  4. In the aes argument put month on the x-axis and log(sales) on the y-axis

  5. Use geom_point to produce a line

  6. Once the r chunk runs fine, copy it and

  7. Add aes(color=year) to the geom.

  8. Copy the latest r chunk, and add the geom_smooth to the plot

I've tried changing the ggplot coding multiple times but I don't come any further than a simple dot in the middle of a graph. Because the ggplot won't even work yet when I try the geom_point , I haven't added geom_smooth yet either

library(tidyverse)
summary(txhousing)
na.omit(txhousing)
txhousing<- as.data.frame(txhousing)
txhousing %>% mutate(logsales= log(txhousing$sales))
ggplot(na.omit(txhousing), aes("month", "logsales")) +
  geom_point(aes(color=year))

I expect to get a scatterplot of the logsales and month from the txhousing data but what I get so far is a graph with the names of the variables on the axis, but further it's a blue dot in the origin of the graph and a legend which says what color stands for what year.

markus
  • 25,843
  • 5
  • 39
  • 58
  • 2
    Welcome to SO. Please [see here and learn](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) how to ask a question, thereafter revise your post to reflect the same. – mnm Jun 11 '19 at 12:49
  • When you make logsales you need to assign the new data frame to something, otherwise it's not stored anywhere, just output to the console – rg255 Jun 11 '19 at 13:05

2 Answers2

1

You have several issues at work here. Starting from the bottom

ggplot(na.omit(txhousing), aes("month", "logsales")) + geom_point(aes(color=year))

The variable names in aes must be unquoted. As is, ggplot is literally plotting "month" vs. "logsales". i.e. a single point on two categorical scales. So remove the quotation marks.

Secondly, when ggplot fails, examine your input. What does na.omit(txhousing) look like? This leads to the next point:

txhousing %>% mutate(logsales= log(txhousing$sales)) 

does not do what you expect. Sure, you calculate the logarithm of scales. But you aren't saving the result. You should be doing:

txhousing <- txhousing %>% mutate(logsales = log(sales))

or using the magrittr package (might be loaded via tidyverse):

txhousing %<>% mutate(logsales= log(sales))

See how I'm leaving out txhousing$ from the functions? That's because mutate will look for the variables in it's input data.frame, i.e. the data.frame piped into mutate.

Lastly, you can instruct ggplot to use a logarithmic scale without pre-calculating the logarithms:

ggplot(na.omit(txhousing), aes(month, sales)) +
  geom_point(aes(color=year)) +
  scale_y_log10()
OTStats
  • 1,820
  • 1
  • 13
  • 22
MrGumble
  • 5,631
  • 1
  • 18
  • 33
0

After cleaning up the code a bit, you can see the key problem is that you make a data frame with the logsales column, but don't assign it to anything:

library(tidyverse)

txhousing <- txhousing %>% 
  mutate(logsales = log(sales))

ggplot(data = na.omit(txhousing)) +
  geom_point(mapping = aes(x = month, y = logsales, color = year))

You also needed to remove the quotes around month and logsales when providing the aes.

rg255
  • 4,119
  • 3
  • 22
  • 40