0

I can't find a solution by checking previous questions so I thought I'd throw this here.

I have an excel file with columns titled 'Date' and 'Tensiometer'. Might be important to note that the dates are saved as d/mm/yyyy.

I'm trying to use ggplot to make a graph with dates on the x axis and the tensiometer reading on the y axis. When I make it, the values for the dates are 1) really large whole numbers (43275, 43300, etc.) and 2) not on every tick of the x axis. The values are in chronological order, it's just not displaying the right numbers.

Here's the code that I have so far

library(openxlsx) 
library(ggplot2)

read.xlsx(file.choose("file that I'm using"))
df <-read.xlsx(file.choose("file that I'm using"))
ggplot(data = df) + geom_point(mapping = aes(x = Date, y = Tensiometer))

Here's an example of what the data looks like:

structure(list(Plot = c(1046, 1013, 1082, 1095, 1163, 1046), Treatment = c(5, 3, 2, 4, 1, 5), Date = c(43258, 43258, 43258, 43258, 43258, 43264), Time = c(0.425694444444444, 0.425694444444444, 0.425694444444444, 0.425694444444444, 0.425694444444444, 0.394444444444444), Tensiometer = c(19, 13, 20, 12, 20, 34 )), row.names = c(NA, 6L), class = "data.frame")

I'm currently only interested in plotting 'Date' and 'Tensiometer', though. Any help would be greatly appreciated. Thanks!!

Tom
  • 377
  • 3
  • 13
  • 1
    Welcome to StackOverflow. If you could provide some data to make your [example reproducible](https://stackoverflow.com/a/5963610/2359523) that would be helpful. – Anonymous coward Dec 18 '18 at 17:40
  • Thanks! I'm reading from an excel file, but here's what a few rows would look like. Does this help? EDIT: Oh wow, looks like that formatting didn't work at all. I'll try to add it into the original post – Tom Dec 18 '18 at 17:56
  • 1
    You can paste the output of `dput(head(df))` into your question. – Anonymous coward Dec 18 '18 at 18:09
  • Excel stores dates as the number of days since the start of 1900 (or 1904 for Excel on OSX) https://stackoverflow.com/questions/43230470/how-to-convert-excel-date-format-to-proper-date-with-lubridate – Jon Spring Dec 18 '18 at 18:21
  • If you're only interested in the dates, you can use the `detectDates = T` flag within `read.xlsx`. `df <- read.xlsx("Book1.xlsx", detectDates = T)` . Or convert them following the link Jon posted. – Anonymous coward Dec 18 '18 at 18:22
  • Thank you for the help! When I use the `detectDates = T` function like you mention, I get the error code that this is an unused argument. Am I missing a step? In addition, when I use the `as.date` function that Jon provided, it will give me the correct date, but I'm not sure how to apply that date to the x-axis of the graph. You guys have already been super helpful, but do you have any idea what I'm missing here? – Tom Dec 18 '18 at 18:33

1 Answers1

0

If you know whether the date represents days from 1900 (or 1904), you can simply add the Date field to the start date to get what day it is. For example, using your sample data saved as df:

df<-structure(list(Plot = c(1046, 1013, 1082, 1095, 1163, 1046), 
               Treatment = c(5, 3, 2, 4, 1, 5), 
               Date = c(43258, 43258, 43258, 43258, 43258, 43264), 
               Time = c(0.425694444444444, 0.425694444444444, 
                        0.425694444444444, 0.425694444444444, 0.425694444444444, 
                        0.394444444444444), 
               Tensiometer = c(19, 13, 20, 12, 20, 34)), 
              row.names = c(NA, 6L), class = "data.frame")

df <- df %>% mutate(Date_Structured = as.Date("1900-01-01")+Date) 

Gives:

##   Plot Treatment  Date      Time Tensiometer Date_Structured
## 1 1046         5 43258 0.4256944          19      2018-06-09
## 2 1013         3 43258 0.4256944          13      2018-06-09
## 3 1082         2 43258 0.4256944          20      2018-06-09
## 4 1095         4 43258 0.4256944          12      2018-06-09
## 5 1163         1 43258 0.4256944          20      2018-06-09
## 6 1046         5 43264 0.3944444          34      2018-06-15

Additionally, we can create a observation ID for each occurrence on the same date:

df$Date_Obs <- 
ave(as.character(df$Date_Structured), 
    as.character(df$Date_Structured), 
    FUN=seq_along)

This gives the following:

##   Plot Treatment  Date      Time Tensiometer Date_Structured Date_Obs
## 1 1046         5 43258 0.4256944          19      2018-06-09        1
## 2 1013         3 43258 0.4256944          13      2018-06-09        2
## 3 1082         2 43258 0.4256944          20      2018-06-09        3
## 4 1095         4 43258 0.4256944          12      2018-06-09        4
## 5 1163         1 43258 0.4256944          20      2018-06-09        5
## 6 1046         5 43264 0.3944444          34      2018-06-15        1

From there, we can plot each observation on the same bar for the same date, but colored different as to be able to tell them apart using fill = Date_Obs. I use width = 1 to reduce overlap of the bar onto other dates, and then scale_x_date() to show each day between occurrences. Finally, I rotate the labels on the x-axis to be easier to read:

df %>%
  ggplot(aes(x = Date_Structured, y = Tensiometer, fill = Date_Obs))+
  geom_bar(stat="Identity",width = 1) + # Making the bars fall only cover the date they correspond to
  scale_x_date(date_labels="%m-%d-%Y", date_breaks  ="1 day") + # Adding each date to x-axis
  theme(axis.text.x = element_text(angle = 65, hjust = 1))

enter image description here

jon
  • 370
  • 1
  • 11
  • Thank you! This worked on the date changes, however, it changed the tensiometer levels to be much higher (from ~80 to ~1,250). Any idea why this change could have happened? Additionally, do you know how I would be able to get the x-axis to read the specific date for each entry? Thanks again! – Tom Dec 18 '18 at 21:06
  • Worth noting that when I changed `geom_bar` to `geom_point`, this tensiometer error went away, so I'm all good here! Would still appreciate any help with the x-axis. Currently it just states the name of the month at the beginning of the month, and it would be great to show each date. Thanks! This has been a big help. – Tom Dec 18 '18 at 21:13
  • @Trev The example data only has values for two unique dates. The bars are being aggregated for the common dates. Are there multiple rows with the same date in your data? – jon Dec 18 '18 at 21:22
  • Yes, I guess that I should have showed that. There’s a pretty large range of dates and generally 5 observations per date – Tom Dec 19 '18 at 18:57
  • Please see the edits I made to the answer -- I hope that helps @Trev – jon Dec 20 '18 at 14:22
  • Just got into the office, really happy to see more help! This seems like it's really close, but when I try to create the observation IDs, I get the error code ``$<-.data.frame`(`*tmp*`, Date_Obs, value = character(0)) : replacement has 0 rows, data has 155`. Does this have something to do with the type of factor of my dates? – Tom Dec 20 '18 at 15:30
  • Hmm, it shouldn't be giving an error. The step where you create the date observations evaluates on the `Date_Structured` field, which is of class `Date`. It converts it to a `character` field, and then does the running total within that group. Can you verify that `class(df$Date_Structured)` gives `[1] "Date"` as a response, and that running everything in the `ave` function does not throw an error? – jon Dec 20 '18 at 16:48
  • 1
    Hey Jon, the issue was that I hadn't assigned the updated plot containing `Date_Structured` to `df`, I had just run the code. After assigning that, I was able to get it to work. Thanks a ton for all of the help! – Tom Dec 20 '18 at 19:47