0

I am working with a very big dataset. It has yearly data, and I am creating the monthy average of a daily value. I have so many rows of different indicators, and the columns are the daily values of them. The problem is that, when a value is not valid, it is a 0, so when calculating the average, it is taken into account. How can I exclude it? is turning it into a NA the only way?

I have this chunk of code:

DF <- DF %>%
  rowwise() %>%
  mutate(montly_Average = mean(c(D01, D02, D03, D04, D05,
                                 D06, D07, D08, D09, D10,
                                 D11, D12, D13, D14, D15,
                                 D16, D17, D18, D19, D20,
                                 D21, D22, D23, D24, D25,
                                 D26, D27, D28, D29, D30,
                                 D31)))

I have about 70 variables, that is why I had to select the ones that interest me that way.

Patrick
  • 742
  • 7
  • 19
Nerea
  • 1
  • `DF %>% mutata(montly_Average = rowSums(across(D01:D31, ~na_if(.x, 0)), na.rm = TRUE))`. This assumes there are no other variabele between your variables `D01` and `D31`. ie all the variables you are interested in are in a sequence. If that is not the case, you could use, `across(matches('D[0-2][0-9]|D3[01]', ...))` – Onyambu Jul 24 '23 at 15:01
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jul 24 '23 at 15:02

1 Answers1

0

Of course not, there is a couple of ways more:

Example: if your data where stored in a x variable like this:

x = c(2,4,5,6,7,8,0,1,2,3,0)

You could use this:
mean(x[x!=0]) #calculate mean where x value is not zero

What is equivalent to do:

x[x==0]=NA
# Followed by
mean(x, na.rm=T)
Alan Gómez
  • 211
  • 1
  • 6