1

How can I group my database by month of the year and get the value of that grouping for each column I have ? in R

Here is a pic of my dataframe:

enter image description here

I tried to do that but it is not working:

df_publications <- df_publications %>% group_by(publication_date) 
                   %>% count() 
Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
isma
  • 17
  • 3
  • 1
    Use `substr` or `stringr::str_extract` or something similar to extract the month into its own column. If you need more help, please post some sample data as copy/pasteable R syntax, not as a picture. `dput()` can help with that, e.g., `dput(your_data[1:5, ])` for the first 5 rows. – Gregor Thomas Nov 29 '22 at 14:32
  • 1
    Then you can use your favorite [sum by group method](https://stackoverflow.com/a/72160221/903061). – Gregor Thomas Nov 29 '22 at 14:35
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please [do not post code or data in images](https://meta.stackoverflow.com/q/285551/2372064) – MrFlick Nov 29 '22 at 14:35

2 Answers2

0

Dplyr summarize across everything method:

df<-data.frame(publication_date=c("2015 Jul","2015 Jul","2015 Aug","2015 Aug"),
               Asym=c(3,5,1,2),
               Auth=c(5,7,2,3),
               Cert=c(1,2,3,4))

library(tidyverse)

df %>% 
  group_by(publication_date) %>% 
  summarize(across(everything(), sum))

#  publication_date  Asym  Auth  Cert
#1 2015 Aug             3     5     7
#2 2015 Jul             8    12     3

base::xtabs() method, requires naming all the columns:

xtabs(cbind(Auth, Asym, Cert)~., data=df)
#publication_date Auth Asym Cert
#        2015 Aug    5    3    7
#        2015 Jul   12    8    3

Alternative thanks to @akrun https://stackoverflow.com/a/74619313/10276092

xtabs(sprintf("cbind(%s)~.", toString(names(df)[-1])), data = df)
M.Viking
  • 5,067
  • 4
  • 17
  • 33
-1

In SQL you could use a subquery:

SELECT Year_, Month_, SUM(Counts)
FROM (
        SELECT YEAR(DATEADD(MM,DATEDIFF(MM,0,StartTime),0))'Year_'
              ,DATENAME(MONTH,DATEADD(MM,DATEDIFF(MM,0,StartTime),0))'Month_'
              ,TestName
              ,CASE WHEN Testname = 'POE Business Rules' THEN (count(TestName)*36) 
                    WHEN TestName = 'Submit' THEN (COUNT(TestName)*6) 
                    ELSE 0 
               END 'Counts'
        FROM VExecutionGlobalHistory
        GROUP BY YEAR(DATEADD(MM,DATEDIFF(MM,0,StartTime),0))
                ,DATENAME(MONTH,DATEADD(MM,DATEDIFF(MM,0,StartTime),0))
                ,TestName
        )sub
GROUP BY Year_, Month_
ORDER BY CAST(CAST(Year_ AS CHAR(4)) + Month_ + '01' AS DATETIME) 

Update: Added ORDER BY to sort by YEAR/MONTH oldest first.

M.Viking
  • 5,067
  • 4
  • 17
  • 33