0

I have three columns; loan_id, amount, date. I have 1,048,575 entries and I need to add together all the duplicates in loan_id column(there are different payments on the same loan_id) and in the second table the amount values should be added together matching with the loan_id.

Sample of how my data looks like this

enter image description here

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
asdf123
  • 1
  • 1
  • Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Ronak Shah Oct 02 '19 at 11:32

2 Answers2

0

Try

aggregate(df$amount,list(df$loan_id),sum)
user2974951
  • 9,535
  • 1
  • 17
  • 24
  • If I add those together, how can I make a separate table only with two columns, loan_id and amount? In the console, It shows me the first 500, but I need a table where the whole list is shown (75466 rows) – asdf123 Oct 02 '19 at 12:50
  • @asdf123 !!! That's only the print output. Save the results in an object and then operate on it arbitrarily. – user2974951 Oct 02 '19 at 12:53
  • yea but it says: [ reached 'max' / getOption("max.print") -- omitted 75466 rows ], it shows the first 500, but how can I see the whole list? – asdf123 Oct 02 '19 at 13:02
  • @asdf123 I don't know why you would want to see all that, but anyway use `options(max.print=1e5)` or some other number. – user2974951 Oct 02 '19 at 13:03
  • If I add those duplicates together I get the values in the console, but I need to make a separate data table with new variables – asdf123 Oct 02 '19 at 14:15
0

So you want the total amount for each loan_id irrespective of date?

One way to do aggregate functions like this in R is by using the data.table package.

library(data.table)

# assuming you start with a data.frame

mydata = data.table(mydata)

mydata[,sum(amount), by=loan_id]
Adam Waring
  • 1,158
  • 8
  • 20