0
df <- structure(list(Month = c(1, 2, 3, 1, 2, 3), 
Product = c('A', 'A','A','B','B','B'),
Amount1 = c(100,NA,200,200,400,NA),
calculated1 = c(100,150,200,200,400,300),
Amount2 = c(100,NA,300,200,NA,500),
calculated2 = c(100,200,300,200,350,500)),         
.Names=c("Month","Product","Amount1","Calculated1",
"Amount2","Calculated2"),row.names = c(NA, 6L), class = "data.frame")

I need your help on the above sample problem. I have month, product, Amount1 and Amount2 columns in R data table. Not all the months have amount information for all products as shown in above dataframe. if NA found(missing value) for any product, i want to calculate average of remaining months for respective products as shown in "Calculated" columns. Any help on this is highly appreciated, i am new learner.

Thank you!

I have tried filtering based on Amount1 and then calculate average and assign values to different data frame. Later use join to bring the values for nulls. But since i have 2 different Amount columns to work on filtering on one column would not help.

  • 2
    Do you have `NA` as missing values?. Please use `dput` to show the input example so that we get the structure correctly – akrun Feb 27 '23 at 18:16
  • 2
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with desired output that can be used to test and verify possible solutions. What is the output you expect to see in this case? What is the shape of the data? – MrFlick Feb 27 '23 at 18:18
  • 1
    now it is even worst because we have no data to work with. Please share your data with `dput` – GuedesBF Feb 27 '23 at 18:20
  • pardon me, I have added a sample table. Hope that helps. – Arun kumar Feb 27 '23 at 18:29
  • 1
    Arunkumar, we can't test code on an image. Please use `dput` to provide sample data, it is unambiguous and easy for us to use. Perhaps? `quux <- structure(list(Month = c(1L, 2L, 3L, 1L, 2L, 3L), Product = c("A", "A", "A", "B", "B", "B"), Amount1 = c(100L, NA, 200L, 300L, 400L, NA), Calculated1 = c(100L, 150L, 200L, 300L, 400L, 350L), Amount2 = c(200L, 100L, NA, 300L, NA, 600L), Calculated2 = c(200L, 100L, 150L, 300L, 450L, 600L)), class = "data.frame", row.names = c(NA, -6L))` – r2evans Feb 27 '23 at 18:49
  • But that sample data _presumes_ things about the columns, namely that they are integers/numbers and not strings. That's ambiguity that providing `dput` output from _your real data_ would be informative. – r2evans Feb 27 '23 at 18:51

1 Answers1

0

I used this sample data:

df <- data.frame(Month = c(1, 2, 3, 1, 2, 3), 
                     Product = c('A', 'A','A','B','B','B'),
                     Amount1 = c(100,NA,200,200,400,NA),
                     calculated1 = c(100,150,200,200,400,300),
                     Amount2 = c(100,NA,300,200,NA,500),
                     calculated2 = c(100,200,300,200,350,500))

The processing is as follows:

library(dplyr)
df %>% 
  group_by(Month, Product) %>% # grouping using both variables
  mutate(Calc1_M_P = mean(Amount1), # temporary variables (columns) are created
         Calc2_M_P = mean(Amount2)) %>% # mean is calculated so that if there are NAs the function returns NA
  ungroup() %>% 
  group_by(Product) %>% # The data.frame is grouped again using only Product
  mutate(Calc1_P = mean(Amount1, na.rm = T), # New tmp variables are created. NAs skipped by mean()
         Calc2_P = mean(Amount2, na.rm = T)) %>% 
  mutate(Calc1 = coalesce(Calc1_M_P, Calc1_P), # final variables are constructed. If first is present the econd is skipped otherwise the second is taken
         Calc2 = coalesce(Calc2_M_P, Calc2_P),
         Calc1_P = NULL, # drop tmp variables
         Calc2_P = NULL,
         Calc1_M_P = NULL,
         Calc2_M_P = NULL
               )

The above code returns:

# A tibble: 6 × 8
# Groups:   Product [2]
  Month Product Amount1 calculated1 Amount2 calculated2 Calc1 Calc2
  <dbl> <chr>     <dbl>       <dbl>   <dbl>       <dbl> <dbl> <dbl>
1     1 A           100         100     100         100   100   100
2     2 A            NA         150      NA         200   150   200
3     3 A           200         200     300         300   200   300
4     1 B           200         200     200         200   200   200
5     2 B           400         400      NA         350   400   350
6     3 B            NA         300     500         500   300   500

You see that Calc1 == calculated1 and Calc2 == calculated2. So the result is as expected.

asd-tm
  • 3,381
  • 2
  • 24
  • 41