Why R doesn't read my columns as numeric?

Question

I have a data frame grouped by hours and days, with three attributes: calories, steps and intensity. All of them are in int type format, I've already checked a lot of times with glimpse(). I want to build a plot with this data frame, but the plot doesn't change if I change the attributes. I added a new fill based on intensity and I found the problem; ggplot2 counts the number of rows and for this the plot never change.

Here a dput() of the data frame, only tidyverse package is necessary :

structure(list(id = c(7007744171, 2347167796, 8053475328, 8877689391, 
8877689391, 7007744171, 8053475328, 7086361926, 8053475328, 8877689391, 
7007744171, 8053475328, 8053475328, 8253242879, 7086361926, 8053475328, 
8877689391, 2022484408, 8053475328, 8053475328), hour = c(8, 
8, 19, 17, 18, 8, 19, 17, 19, 16, 8, 19, 21, 10, 13, 14, 12, 
9, 19, 14), day = structure(c(1L, 6L, 4L, 3L, 3L, 3L, 6L, 3L, 
3L, 4L, 4L, 2L, 7L, 7L, 2L, 4L, 2L, 1L, 5L, 7L), .Label = c("Monday", 
"Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"
), class = "factor"), calories = c(353L, 317L, 413L, 505L, 497L, 
336L, 379L, 512L, 357L, 397L, 293L, 335L, 334L, 251L, 279L, 330L, 
353L, 338L, 323L, 321L), steps = c(4904L, 4752L, 4706L, 4606L, 
4328L, 4247L, 4127L, 4089L, 3794L, 3705L, 3660L, 3553L, 3451L, 
3440L, 3401L, 3396L, 3387L, 3322L, 3302L, 3280L), intensity = c(138L, 
117L, 121L, 107L, 107L, 123L, 101L, 143L, 91L, 72L, 105L, 87L, 
87L, 71L, 81L, 79L, 82L, 99L, 83L, 86L), status = structure(c(2L, 
2L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 
3L, 3L, 3L), .Label = c("Sedentary User", "Light User", "Heavy User"
), class = "factor")), row.names = c(NA, 20L), class = "data.frame")

Here is the code of the plot:

ggplot(data=week_hourly,
   mapping=aes(x=hour, y=intensity, fill = intensity, alpha=hour)) +
  geom_col() + 
  coord_flip() +  
  scale_fill_gradient(low = "#8a2380", high = "#f27121") + 
  scale_alpha(range=c(0.7,1), guide="none") + 
  labs(title="Intensity per Hour", 
  subtitle="Through the week", x="Hour", y= "Intensity") +
  theme(legend.position = "top") +
  scale_x_continuous(breaks=seq(0,23,4)) + 
  facet_grid(status~day)

And here is the result:
enter image description here

As you can see the fill doesn't count the unique value of intensity, and in the axis X the scale is until 400 when the max value on intensity is of 165. I've already tried convert the columns with as.integer and as.numeric and other methods, but nothing helps.

It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input that can be used to test and verify possible solutions. Pictures of data are not helpful because we don't want to have to retype everything just to test. The problem looks like you have not summarized your data before plotting and you have multiple values per day/hour and values are being stacked. This doesn't really look like it has to do with numeric vs non-numeric values. — MrFlick, Jul 16 '21 at 23:44
You might *try* something like `library(tidyverse); wh_sum <- week_hourly %>% group_by(status, day, hour) %>% summarise(across(intensity, mean))` (might also need to throw in an `na.rm=TRUE` as appropriate if you have `NA` values) — Ben Bolker, Jul 17 '21 at 00:32
Ty for the information @MrFlick , i've already update the post with a dput() code — Salvador Marquez, Jul 17 '21 at 02:05
@BenBolker you are a genius men, you solve the problem with only one line code, thank u so much. Please post that comment as an answer, so I can highlight it. And again thank you very much. — Salvador Marquez, Jul 17 '21 at 02:20

score 1 · Accepted Answer · answered Jul 17 '21 at 15:44

@MrFlick diagnosed the problem correctly:

The problem looks like you have not summarized your data before plotting and you have multiple values per day/hour and values are being stacked.

One sensible way to summarize your data (i.e., collapse all the intensity measurements for a particular status/day/hour combination to a single value) would be

library(tidyverse)
wh_sum <- week_hourly %>% 
          group_by(status, day, hour) %>% 
          summarise(across(intensity, mean))

You could probably also do this on the fly with stat_summary():

ggplot(data=week_hourly,
   mapping=aes(x=hour, y=intensity, fill = intensity, alpha=hour)) +
  stat_summary(fun.y = mean, geom = "col") + ...

Why R doesn't read my columns as numeric?

1 Answers1