-1

I am trying to group data based upon dates (all observations taken in a day) and applying a function to count no. of observations in the grouped data.

my code for this purpose is:

Library(ggplot2)
library(lubridate)
library(tidyverse)


cmsaf_data <- read.csv("tmy_era_25.796_45.547_2005_2014.csv",
             skip = 16, header = T)

data <- cmsaf_data %>%
  mutate(time = mdy_hm(Date_Time),
         date = date(time), months = month(date))


data <- subset(data,Global.horizontal.irradiance..W.m2.>0) # subsetting based upon values of GHI > 0

year(data$date) <- 2007

summarised <- data %>%
  group_by(date) %>% summarise(hours = nrow(data))

In the last line of this code, I am trying to group data date wise and calculating no. of observations i.e now of rows in my data but the result of this is that instead of getting no. of rows of the particular group, I am getting no. of rows of the whole data.

Previously I have worked on the same code and applied sum function to my grouped data and it was working perfectly! Now when I am trying to apply nrow() function to count no. of rows, this code isn't working.

I am not sure what mistake I am making. If there is any correction that can be done or method that I can follow, please guide me to it!

Link to my data is: link

Thanks in advance!

Jawairia
  • 295
  • 4
  • 14
  • 2
    I wonder if you want to use `n()` instead of `nrow(data)`. – jazzurro Feb 18 '18 at 12:09
  • You got my downvote because your code is not reproducible. If you can fix that, I will retract my downvote and give you an upvote. – www Feb 18 '18 at 12:09
  • @jazzurro n() for what? calculating no. of rows? – Jawairia Feb 18 '18 at 12:17
  • @www not reproducible in what sense? I have also linked my dataset with the post. – Jawairia Feb 18 '18 at 12:19
  • Not reproducible means I cannot run your code by copying and pasting you code to my console. I have to remove extra rows in your CSV file without knowing if I am doing it right. Please review this post to see how to make reproducible example (https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – www Feb 18 '18 at 12:22
  • 1
    @Jawairia Yeah. You grouped your data by date, right? Then, why do you want to count the number of row for the entire data set? I think you want to know how many rows exist for each level of date. If that is the case, you could simply use `count(data, date)` or, in your way, `group_by(date) %>% summarise(hours = n())`. – jazzurro Feb 18 '18 at 12:25
  • @jazzuro thanks a lot! it worked. That's what I didn't know how to calculate no. of observations in each group! – Jawairia Feb 18 '18 at 12:30
  • @www edited my code. please have a look! – Jawairia Feb 18 '18 at 12:33
  • 1
    @Jawairia Thanks for your update, but it is still not reproducible. How about you remove row 1 to row 16 in your CSV file and I will retract my downvote and give you an upvote? At least by doing that people can read in the CSV file without extra work. Please understand that not many people are willing to download a CSV file and study it because of some potential risks or just time consuming. Please study how to use `dput` to share the data in the future. – www Feb 18 '18 at 12:37

1 Answers1

1

Below is a comparison between nrow(data) and n() after grouping the data frame. To count row in each group, we should use n(), while nrow(data) returns the entire row number of a data frame.

data %>%
  group_by(date) %>% summarise(hours = nrow(data))
# # A tibble: 365 x 2
#    date       hours
#    <date>     <int>
#  1 2007-01-01  4272
#  2 2007-01-02  4272
#  3 2007-01-03  4272
#  4 2007-01-04  4272
#  5 2007-01-05  4272
#  6 2007-01-06  4272
#  7 2007-01-07  4272
#  8 2007-01-08  4272
#  9 2007-01-09  4272
# 10 2007-01-10  4272
# # ... with 355 more rows

data %>%
  group_by(date) %>% summarise(hours = n())
# # A tibble: 365 x 2
#   date       hours
#    <date>     <int>
#  1 2007-01-01    10
#  2 2007-01-02    10
#  3 2007-01-03    10
#  4 2007-01-04    10
#  5 2007-01-05    10
#  6 2007-01-06    10
#  7 2007-01-07    10
#  8 2007-01-08    10
#  9 2007-01-09    10
# 10 2007-01-10    10
# # ... with 355 more rows
www
  • 38,575
  • 12
  • 48
  • 84