0

I'm very new to R (using RStudio) coming from a background in Excel using long vlookup and sumif expressions (which i'd use for this problem). However want to use R as the dataset SALES is so large. Is there a way to do the following in R?

I have a large dataframe called SALES with columns such as the following with example data:

 - **ID,  COUNTRY_CODE,  INVOICE_DATE,   NET_AMOUNT,  COUNTRY_NAME  ... etc** 
 - 1,         GB,        05/03/2017,       128,           ... 
 - 2,         GB,        05/03/2017,        58, 
 - 3,         FR,        04/03/2017,        200, 
 - 4,         US,        03/03/2017,       139,           ... 
 - 5,         GB,        03/03/2017,       152, 
 - ... 
 - 65492,     FR,        21/01/2011,        91, 
 - 65492,     GB,        20/01/2011,       136,

I want to sum the NET_AMOUNT from the SALES data frame, per month and per year, for each unique country code and recorded this in a new dataframe DF_SALES_BY_COUNTRY.

I have started by creating a new dataframe DF_SALES_BY_COUNTRY containing unique values for the COUNTRY_CODE and COUNTRY_NAME:

COUNTRY_NAME_UNIQUE <-unique(SALES$`COUNTRY_NAME`, incomparables = FALSE)
COUNTRY_CODE_UNIQUE <- unique(SALES$`COUNTRY_CODE`, incomparables = FALSE)
DF_SALES_BY_COUNTRY<- data.frame(COUNTRY_NAME_UNIQUE,COUNTRY_CODE_UNIQUE)

,then I get stuck:

  • How to sum the NET_AMOUNT for each COUNTRY_CODE per month and year (based on the INVOICE_DATE in the SALES dataframe) and adding it to a new column in DF_SALES_BY_COUNTRY

Any advice on how to, or even if ive started correctly would be great. Thanks!

Sotos
  • 51,121
  • 6
  • 32
  • 66
  • 1
    Consider using `aggregate`. Something along the lines of `aggregate(NET_AMOUNT ~ COUNTRY_CODE + MONTH + YEAR, FUN = sum, data = xy)`... – Roman Luštrik Jul 06 '17 at 13:58
  • Hi, thanks for suggesting using aggregate. When investigating how to I found this solution: DF_SALES_BY_COUNTRY$NET_AMOUNT_TO_DATE <- sum(SALES$`NET_AMOUNT`[SALES$`COUNTRY_CODE`=='FR'],na.rm = TRUE) However, now, how do I populate my column DF_SALES_BY_COUNTRY$NET_AMOUNT_TO_DATE by replacing the 'FR' country code in my solution above, with the corrisponding DF_SALES_BY_COUNTRY$COUNTRY_CODE row value, for each row. In Excel it would b the equivelent of dragging the sumif statment down for each row of cels in the DF_SALES_BY_COUNTRY sheet and selecting whole columns on the SALES sheet. – Charlie Jul 07 '17 at 07:51

0 Answers0