I'm struggling with finding an efficient solution for the following problem:
I have a large manipulated data frame with around 8 columns and 80000 rows that generally includes multiple data types. I want to create a new data frame that includes the sum of one column if conditions from the large data frame are met.
Imagine the head of the original data frame looks like this. The column $years.raw indicates that the company measured data for x years.
> cbind(company.raw,years.raw,source,amount.inkg)
company.raw years.raw source amount.inkg
[1,] "C1" "1" "Ink" "5"
[2,] "C1" "1" "Recycling" "2"
[3,] "C2" "1" "Coffee" "10"
[4,] "C2" "1" "Combusted" "15"
[5,] "C2" "2" "Printer" "14"
[6,] "C2" "2" "Tea" "18"
What I need to do now is to create a new data frame that sums up the values of column $amount.inkg for every company and every year based on certain string elements. I saved the string elements in three vectors below. The string elements are part of the column $source from the original data frame.
> vector1 <- c("Tea","Coffee")
> vector2 <- c("Ink","Printer")
> vector3 <- c("Recycling","Combusted")
The preferred data frame would then look like this:
Company Year amount.vector1 amount.vector 2 amount.vector 3
C1 1 0 5 2
C2 1 10 0 15
C2 2 18 14 0
The general approach for $amount.vector1 would be: Sum up the values of column $amount.inkg for every company and every year where string elements of the original data frame column $source== string elements of vector1. The same for column $amount.vector2 except that the elements are different of course.
If there are no values available, a "0" should be added instead a NA error. This needs to be done for the whole raw data frame that includes around 250 companies with data for every company for the years 1:8 (differs quite a lot).
Edit: For the data frame I need one-row-per-company-per-year.
C1 Year 1
C1 Year 2
C1 Year 3
C2 Year 1
C2 Year 2
I tried to write a function that combines these conditions but I failed. I'm quite new to R and didn't know how to link these conditions and apply them on the whole data frame.