1

I am trying to make a ggplot of solar irradiance (from a weather file) on y-axis and time in months on x-axis.

My data consists of values collected on hour basis for 12 months so overall there are 8760 rows filled with data values.

Now, I want to make plot in such a way that for a single day, I only get a point on plot by adding values for a complete day (Not like taking all the values and plotting them. I believe geom_freqpoly() can plot this type of data. I have looked for this but not finding enough examples in the way I want. (Or if there is some approach that can help me achieve the plot I want as I am not sure what exactly I have to do to add points for a day. Otherwise writing code for 365 days is crazy)

I want the following kind of plot enter image description here

My plot is showing all the reading for a year and looks like this enter image description here

My code for this plotting is :

library(ggplot2)
cmsaf_data <- read.csv("C://Users//MEJA03514//Desktop//main folder//Irradiation data//tmy_era_25.796_45.547_2005_2014.csv",skip=16, header=T)

time<- strptime(cmsaf_data[,2], format = "%m/%d/%Y %H:%M")

data <- cbind(time,cmsaf_data[5])

#data %>% select(time)
data <- data.frame(data, months = month(time),days = mday(time))
data <- unite(data, date_month, c(months, days), remove=FALSE, sep="-")

data <- subset(data, data[,2]>0)

GHI <- data[,2]
date_month <- data[,3]
ggplot(data, aes(date_month, GHI))+geom_line()

whereas my data looks like this :

head(data)

 time Global.horizontal.irradiance..W.m2.
1 2007-01-01 00:00:00                                   0
2 2007-01-01 01:00:00                                   0
3 2007-01-01 02:00:00                                   0
4 2007-01-01 03:00:00                                   0
5 2007-01-01 04:00:00                                   0
6 2007-01-01 05:00:00                                 159

As I want 1 point for a day, how can I perform sum function so that I can get the output I require and show months names on x-axis (may be using something from time and date that can do this addition for a day and give 365 vales for a year in output)

I have no idea at all of any such function or approach.

Your help will be appreciated!

Jawairia
  • 295
  • 4
  • 14
  • Can you provide sample data? https://stackoverflow.com/q/5963269/786542 – Tung Feb 12 '18 at 08:18
  • @Tung I have edited my question. Please have a look on my data format – Jawairia Feb 12 '18 at 08:44
  • Using `head` is not quite useful. You should either post the output of `dput(data)` or upload your `csv` file on a sharing site (e.g. Google Drive or Dropbox) then share the link here – Tung Feb 12 '18 at 16:41
  • @Uwe: I believe OP was trying to calculate the summation for Julian days from 1 to 366 for all the years then plot those 366 values but replace the days with corresponding month names. It's relatively similar to the question you linked but not an exact duplicate imo. – Tung Feb 12 '18 at 18:30
  • @Tung According to the OP, there are 12 months of hourly data (8760 rows) which should be aggregated (summed) for each day (*I want to make plot in such a way that for a single day, I only get a point on plot by adding values for a complete day*). – Uwe Feb 12 '18 at 18:56
  • @Uwe: Ok I guessed it based on the name of the csv file in OP's code `tmy_era_25.796_45.547_2005_2014.csv` – Tung Feb 12 '18 at 19:16
  • 1
    @Tung, perhaps you are right. There is a [comment](https://stackoverflow.com/questions/48739654/plotting-daily-summed-values-of-data-against-months?noredirect=1#comment84492136_48744289) in which the OP discloses that *it is a collection of 12 months (Jan to Dec) from different years*. Unfortunately, the OP seems to have told only a small part of the underlying problem in the question. It would have been much clearer if the OP would have supplied a representative sample dataset. – Uwe Feb 12 '18 at 19:24
  • It is not about Julian years. My data is a collection of months from different years and there is no order of these years. I tried making one common column containing the days and the months. But in the end I am getting similar output from all the solutions mentioned in answers and that is the summation of the complete data (Global horizontal Irradiance) and not just the sum based upon grouping of days. Please find link to my dataset here: [link](https://drive.google.com/file/d/10ej_hGc6nL8VFNrd3-pLVUYJMWbx9wwm/view?usp=sharing) – Jawairia Feb 13 '18 at 04:23
  • @Gregrs has help me solved this issue of summation. The only thing left is the order of dates that I want from 1st Jan to 31st Dec. The order obtained is considering the years order which I don't want. here is his code for this: [link](https://drive.google.com/file/d/1fL8AcQWl7tzw-l0gNMS8xkZgycdsvIDg/view?usp=sharing) – Jawairia Feb 13 '18 at 04:59

2 Answers2

2

Here is a solution using the tidyverse and lubridate packages. As you haven't provided complete sample data, I've generated some random data.

library(tidyverse)
library(lubridate)

data <- tibble(
  time = seq(ymd_hms('2007-01-01 00:00:00'),
             ymd_hms('2007-12-31 23:00:00'),
             by='hour'),
  variable = sample(0:400, 8760, replace = TRUE)
)

head(data)
#> # A tibble: 6 x 2
#>   time                variable
#>   <dttm>                 <int>
#> 1 2007-01-01 00:00:00      220
#> 2 2007-01-01 01:00:00      348
#> 3 2007-01-01 02:00:00      360
#> 4 2007-01-01 03:00:00       10
#> 5 2007-01-01 04:00:00       18
#> 6 2007-01-01 05:00:00      227

summarised <- data %>%
  mutate(date = date(time)) %>%
  group_by(date) %>%
  summarise(total = sum(variable))

head(summarised)
#> # A tibble: 6 x 2
#>   date       total
#>   <date>     <int>
#> 1 2007-01-01  5205
#> 2 2007-01-02  3938
#> 3 2007-01-03  5865
#> 4 2007-01-04  5157
#> 5 2007-01-05  4702
#> 6 2007-01-06  4625

summarised %>%
  ggplot(aes(date, total)) +
  geom_line()

Greg
  • 487
  • 5
  • 15
  • thanks a lot for this solution! I have tried this in the morning but getting an issue. After applyig sum function (sum()) on the data, I am getting same values for the whole data. What can be the possible issue? – Jawairia Feb 12 '18 at 11:05
  • It sounds like you're probably doing the group_by() incorrectly. Make sure the date column is being created successfully by mutate. – Greg Feb 12 '18 at 11:27
  • this is my code: `library(ggplot2) library(tidyverse) library(lubridate) cmsaf_data <- read.csv("C://Users//MEJA03514//Desktop//main folder//Irradiation data//tmy_era_25.796_45.547_2005_2014.csv",skip=16, header=T) time<- strptime(cmsaf_data[,2], format = "%m/%d/%Y %H:%M") data <- data.frame(time,cmsaf_data[5]) GHI <- data[,2] summarised <- data %>% mutate(date = date(time)) %>% group_by(date) %>% summarise(total = sum(GHI)) ` – Jawairia Feb 12 '18 at 11:30
  • I have tried same code even before your answer! Was getting same issue – Jawairia Feb 12 '18 at 11:31
  • I am not using tibble. I am storing using data.frame – Jawairia Feb 12 '18 at 11:44
  • Without your original data file, it's difficult to work out what's going wrong here. – Greg Feb 12 '18 at 11:45
  • I am new to this website, not sure how to share data. Can you provide me your email? I'll send you the file now – Jawairia Feb 12 '18 at 11:46
  • My code above works fine even if you replace tibble with data.frame, so that shouldn't be the issue. – Greg Feb 12 '18 at 11:47
  • You can use dput(data) to create a text version of your data. – Greg Feb 12 '18 at 11:48
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/164961/discussion-between-gregrs-and-jawairia). – Greg Feb 12 '18 at 11:50
  • @Marc I am getting same issue while using your code. I think it is the problem due to data type only. My data file is a TMY. A typical meteorological year (TMY) is a collation of selected weather data for a specific location, generated from a data bank much longer than a year in duration. So it is a collection of 12 months (Jan to Dec) from different years. I think being related to different year is causing problem here. so does your code post any issue if each month data belongs to different year. I hope this explanation can you better guide me! – Jawairia Feb 12 '18 at 12:11
  • TMY file dates I believe should not be an issue. I tried grouping based upon months and still the code is applying sum function on the whole irradiance data (GHI) and not just on groups – Jawairia Feb 12 '18 at 13:20
1

In order to get a sum for every month of every year, you need to create a Column which describes a specific month of a specific year (Yearmon). Then you can group over that Column and sum over that group giving you one sum for every month of every year.

Then you just plot it and set the labels of the x-axis to your liking.

library(ggplot2)
library(dplyr)
library(zoo)
library(scales)

# Create dummy data for time column
time <- seq.POSIXt(from = as.POSIXct("2007-01-01 00:00:00"),
                   to = as.POSIXct("2017-01-01 23:00:00"),
                   by = "hour")

# Create dummy data.frame
data <- data.frame(Time = time,
                   GHI = rnorm(length(time)))

############################
# Add column Yearmon to the data.frame
# Groupy by Yearmon and summarise with sum
# This creates one sum per Yearmon
# ungroup is often not neccessary, however 
# not doing this caused problems for me in the past
# Change type of Yearmon to Date for ggplot
#


df <- mutate(data,
       Yearmon = as.yearmon(Time)) %>%
  group_by(Yearmon) %>%
  summarise(GHI_sum = sum(GHI)) %>%
  ungroup() %>%
  mutate(Yearmon = as.Date(Yearmon))


# Plot the chart with special scale lables
ggplot(df, aes(Yearmon, GHI_sum))+
  geom_line()+
  scale_x_date(labels = date_format("%m/%y"))

I hope this helps.

Marc Flury
  • 341
  • 1
  • 7