2

I'm new and sorry if my question is badly worded.

I'm working in r and I have table called Rent that might look like this :

Rent
       ID      Invoice    Payment      Paid Date
       lucy   7/1/2018     100        9/1/2018
       lucy   7/1/2018     150        10/1/2018
       lucy   8/1/2018     100        11/1/2018

So what I want to do is that since Lucy has two payments on 7/1/2018, I want to combine them together and then sum the payment, and use the latest Paid Date.

What I have so far is that

#to create a row that has the sum of the sales prices 

    Rent[,sum_late:=sum( as.numeric(("Sales Price"))),
    by= c("Id","Invoice Date")]

#take the first of the unique IDs by the max paid date
    head (SD,1) by=c("ID", "Invoice Date", max("Paid Date") 

But when I run the first line all the sum_late column is N/A. I'm not sure what I did wrong. Ideally, I would want a table just like this.

Rent
       ID      Invoice    Payment      Paid Date
       lucy   7/1/2018     250        10/1/2018
       lucy   8/1/2018     100        11/1/2018

Sorry if this is a stupid question, I appreciate any help and feedback!! Thank you all for your time!!

Arun kumar mahesh
  • 2,289
  • 2
  • 14
  • 22
Michael
  • 59
  • 5

2 Answers2

3

We can change Paid_Date to date class, group_by ID and Invoice, sum Payment and select max Paid_Date.

library(dplyr)
Rent %>%
  mutate_at(vars(Invoice, Paid_Date), as.Date, '%d/%m/%Y') %>%
  group_by(ID, Invoice) %>%
  summarise(Payment = sum(Payment), 
            Paid_Date = max(Paid_Date))

#  ID    Invoice    Payment Paid_Date 
#  <chr> <date>       <int> <date>    
#1 lucy  2018-01-07     250 2018-01-10
#2 lucy  2018-01-08     100 2018-01-11

Or if you prefer data.table using the same logic.

library(data.table)
setDT(Rent)[, c("Invoice", "Paid_Date") := .(as.IDate(Invoice, '%d/%m/%Y'), 
                                             as.IDate(Paid_Date, '%d/%m/%Y'))]
Rent[, .(Payment = sum(Payment), Paid_Date = max(Paid_Date)), .(ID, Invoice)]

data

Rent <- structure(list(ID = c("lucy", "lucy", "lucy"), Invoice = c("7/1/2018", 
"7/1/2018", "8/1/2018"), Payment = c(100L, 150L, 100L), Paid_Date = c("9/1/2018", 
"10/1/2018", "11/1/2018")), class = "data.frame", row.names = c(NA, -3L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Hi Ronak, thank you so much! If my date is already at this 2019-10-09 format, and I actually have other columns other than these three, does that mean I only use ` library(dplyr) rent %>% mutate_at(. %>%) group_by(ID, Invoice) %>% summarise(Payment = sum(Payment), Paid_Date = max(Paid_Date))`? – Michael May 21 '20 at 06:32
  • You can add the columns you want to keep in `group_by`. – Ronak Shah May 21 '20 at 06:33
  • I didnt' run the mutate code because my date is already in the right format, but I found that even if I put it at max (paid date), the later paid date row still exists. – Michael May 21 '20 at 06:37
  • Sorry for asking simple question, I only wrote Rent%>% group_by('Id', 'Invoice ') %>% summarise(`Sales Price` = sum(`Sales Price`), 'Paid_Date' = max('Paid_Date')) – Michael May 21 '20 at 06:38
  • 1) Don't directly copy paste the code, use the column and dataframe names that you have in your data. Usually, it is better to have columns without any spaces in their name but if you have column with spaces in the name, use them with backquotes. ilke this `\`Paid Date\``. 2) What is `class(Rent$Paid_Date)` ? 3) Don't use quotes while specifying column names in `dplyr`/`data.table` functions i.e "id" "Invoice" etc.use the bare column name. – Ronak Shah May 21 '20 at 06:47
2

There are multiple ways of doing this task, I will be using for-loops for creating desired output. I echo with @Ronak Shah using dplyr method which make lesser processing time thank using for-loops

Data

Rent <- structure(list(ID = c("lucy", "lucy", "lucy"), Invoice = c("7/1/2018", 
                                                                   "7/1/2018", "8/1/2018"), Payment = c(100L, 150L, 100L), Paid_Date = c("9/1/2018", 
                                                                                                                                         "10/1/2018", "11/1/2018")), class = "data.frame", row.names = c(NA, -3L))

Converting Paid_date into date formats

Rent$Paid_Date <- as.Date(Rent$Paid_Date, "%d/%m/%Y")

For-loops

for ( i in unique (Rent$ID)){
  for (j in unique(Rent$Invoice[Rent$ID == i])){
    Rent$Payment_[Rent$ID==i & Rent$Invoice ==j ] <- sum (Rent$Payment [Rent$ID==i & Rent$Invoice ==j])
    Rent$Paid_dt[Rent$ID==i & Rent$Invoice ==j ] <- max(Rent$Paid_Date[Rent$ID==i & Rent$Invoice ==j])

  }
}

Rent$Paid_dt <- as.Date(Rent$Paid_dt ,origin = "1970-01-01") # converting into date format

Rent1 <- Rent[, unique(c("ID", "Invoice", "Payment_", "Paid_dt"))]

print (Rent1)

    ID  Invoice Payment_    Paid_dt
1 lucy 7/1/2018      250 2018-01-10
2 lucy 7/1/2018      250 2018-01-10
3 lucy 8/1/2018      100 2018-01-11
Arun kumar mahesh
  • 2,289
  • 2
  • 14
  • 22