1

I'm trying to work on a data frame that I have, but am struggling to find the latest date associated with a name and the total sum associated with a name. My frame looks something like this

a<-
Date         Name         Sum
<date>       <chr>       <dbl>

23.02.2017  Johnny        6     
24.02.2017  Jane          20        
24.02.2017  Micky Mouse   20        
27.02.2017  Jane          20        
3.03.2017   Johnny        20        
3.03.2017   Ronald        25    

I would like to get something like this

b<-
Latest Date         Name         Frequency     Total Sum
<date>              <chr>          <dbl>          <dbl>

3.03.2017           Johnny           2             26
27.02.2017          Jane             2             40
24.02.2017          Micky Mouse      1             20       
3.03.2017           Ronald           1             25   

I Started by using the table function and then using a for loop, but I'm a bit of a noob.

b <- data.frame(table(a$Name))
# after cleaning
b<-
Name         Frequency  
<chr>          <int>         

Johnny           2            
Jane             2            
Micky Mouse      1                
Ronald           1   


for (i in (a$Name)) {
  b <- a %>%
    mutate(Total Sum = sum(a$Sum[a$Name == i] %>%
    mutate(Latest Date = max(a$date[a$Name == i]))
}

This would return me a data frame that looks like this

b<-

Name         Frequency        Total Sum     Latest Date
<chr>          <int>            <dbl>         <date>

Johnny           2               40          27.02.2017
Jane             2               40          27.02.2017
Micky Mouse      1               40          27.02.2017
Ronald           1               40          27.02.2017

How can I make sure that the total sum is only that of Johnny, Jane ... and that the date is the latest date associated with that name

2 Answers2

0

This should work:

df <- read.table(text = "Date         Name         Sum
                  1  23.02.2017  Johnny        6     
                  2  24.02.2017  Jane          20        
                  3  24.02.2017  Micky_Mouse   20        
                  4  27.02.2017  Jane          20        
                  5  3.03.2017   Johnny        20        
                  6  3.03.2017   Ronald        25")  

df%>%group_by(Name)%>%
  mutate(Date_Formated = as.Date(Date, format = "%d.%m.%Y"))%>%
  summarise(totalByName = sum(Sum),firstDate = max(Date_Formated))


# A tibble: 4 x 3
         Name totalByName  firstDate
       <fctr>       <int>     <date>
1        Jane          40 2017-02-24
2      Johnny          26 2017-02-23
3 Micky_Mouse          20 2017-02-24
4      Ronald          25 2017-03-03
DataTx
  • 1,839
  • 3
  • 26
  • 49
  • OP is asking for the latest date, not the first. (Also see my comment.) – Axeman Oct 20 '17 at 12:56
  • thanks for pointing that out. I still believe your answer in the comment is incomplete as I would format the date if this was a problem I worked on. – DataTx Oct 20 '17 at 13:01
-3

You can use group_by and summarise from dplyr R Package.

b <- a %>%
    group_by(Name) %>%
    summarise(Total_Sum = sum(Sum))

c <- a %>%
    count(Name)

df <- left_join(b, c)
patL
  • 2,259
  • 1
  • 17
  • 38
  • `summarise` removes the last group, so a double `summarise` will in this case return one row... Except that it fails because the `Date` column isn't even there anymore. Also, you should have `sum(Sum)`, and you are missing a pipe. – Axeman Oct 20 '17 at 09:25
  • Yeah i tried using it and it works fine for the Sum, but like Axeman said, double summarise returns a single row if i use max(a$date), using just max(date) will give an error of object not found. – Martin Jeret Oct 20 '17 at 09:58
  • @MartinJeret I'm sorry. I edit my code. Hope it works it. – patL Oct 20 '17 at 10:02