0

I have a data frame which contains client names and area data.

I want to calculate the total area for each client as some areas span over multiple floors (for example, Client A may have 202 on Floor 1 and 248 on Floor 2).

I want to create a new column with the total area.

I know how to create the new column:

areas$new_area

And I know how to calculate the total area for each client (manually):

sum(areas[areas$client == "Client A", "areas"])

What I am having difficulty with is iterating through the data frame and automating the entire process.

I came up with a partial solution that iterates through the data frame, but it only calculates the sum of each area value for every client at position i (which I know will always happen because it only takes the single value in the area column, of course):

for(i in 1:nrow(areas)){
  areas$new_area[i] <- sum(areas$areas[i])
}

Also, I suspect/know that an apply function is almost certainly the approach to take here, but I don't know which one to use nor how to apply it (no pun intended).

How can I a) achieve this and b) achieve it in a cleaner way?

My expected output is something like this (or some variation of it):

--------------------------------------
| Client | Floor | Area |  New Area  |
--------------------------------------
|   A    |   1   | 202  |    202     |
--------------------------------------
|   A    |   2   | 248  |    450     |
--------------------------------------
|   B    |   1   | 1000 |    1000    |
--------------------------------------
|   B    |   2   | 150  |    1150    |
--------------------------------------

I want a new column at the end with the total of all area values for each client (my example shows a cumulative total, but whether it is cumulative or not doesn't matter - it was merely for the purpose of giving an example).

Mus
  • 7,290
  • 24
  • 86
  • 130
  • @agenis I have updated the question. – Mus Sep 11 '17 at 11:30
  • 1
    ok, it seems to me that what you need is just to do a summation by group? if its the case this answer can help? https://stackoverflow.com/q/1660124/3871924 – agenis Sep 11 '17 at 12:00

1 Answers1

1
summedAreas <- aggregate(Area ~ Client, areas, sum)
allYourData <- merge(Area, summedAreas, by = "Client")

I prefer aggregate over tapply because I get a nice data.frame back, but you could calculate the totals with

tapply(X = areas$Area, INDEX = areas$Client, FUN = sum)
CCD
  • 590
  • 3
  • 8