-1

My dataset contains NDVI values and NDVI-QualityDescriptor values(PixelQa) for different areas in different dates. I basically want to erase (setting to NA) the NDVI values that are related to bad quality descriptor (PixelQa). The number suffix of the column names relates both data: PixelQa_1 is related to NDVI_1 and so on.

Therefore to "clean" my data I have to check PixelQa values in order to assess if I have to change its related NDVI value. There is 3 possible situations:

  1. PixelQa is NA -> then NDVI should be also NA.
  2. Pixel Qa is 66±0.5 OR 130±0.5 -> then NDVI remains the same value.
  3. Pixel Qa is different to 66±0.5 OR 130±0.5 -> then NDVI value is set to NA (this is bad quality data which needs to be ignored).

My dataset could be:

DataNDVI_split <- data.frame("21feb1987_NDVI" = c(0.123, NA, 0.192, 0.234, NA), "21feb1987_PixelQa" = c(66.30, NA, 66.00, 79.87, NA), "18jul1987_NDVI" = c(0.223, NA, 0.230, 0.334, NA), "21feb1987_PixelQa" = c(66.30, NA, 66.00, 79.87, NA), stringsAsFactors = FALSE)
DataNDVI_split
  X21feb1987_NDVI1 X21feb1987_PixelQa1 X18jul1987_NDVI2 X21feb1987_PixelQa2
1           0.123              66.30           0.223                66.30
2              NA                 NA              NA                   NA
3           0.192              66.00           0.230                66.00
4           0.234              79.87           0.334                79.87
5              NA                 NA              NA                   NA

And "clean" it should look like:

DataNDVI_split <- data.frame("21feb1987_NDVI" = c(0.123, NA, 0.192, 0.234, NA), "21feb1987_PixelQa" = c(66.30, NA, 66.00, 79.87, NA), "18jul1987_NDVI" = c(0.223, NA, 0.230, 0.334, NA), "21feb1987_PixelQa" = c(66.30, NA, 66.00, 79.87, NA), stringsAsFactors = FALSE)
DataNDVI_split
  X21feb1987_NDVI1 X21feb1987_PixelQa1 X18jul1987_NDVI2 X21feb1987_PixelQa2
1           0.123              66.30           0.223                66.30
2              NA                 NA              NA                   NA
3           0.192              66.00           0.230                66.00
4              NA              79.87              NA                79.87
5              NA                 NA              NA                   NA
  • Hi Oriol Baena Crespo. Welcome to StackOverflow! Please do not post images of code or data here! Please read the info about [how to ask a good question](https://stackoverflow.com/help/how-to-ask) and how to give a [minimale reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). That way you can help others to help you! – dario Mar 02 '20 at 09:57
  • Your problem is surely resolvable. Just make life on those who want to help you a little easier by providing reproducible data and/or stating your problem in clearer terms. – Chris Ruehlemann Mar 02 '20 at 10:13
  • Sorry Dario and Chris! I added it now. Thank for the suggestion. I will keep both pictures for a while because I think it's a little bit difficult to understand the whole thing! – Oriol Baena Crespo Mar 02 '20 at 10:23
  • I don't understand how what you call `Pixel Qa`and `NDVI`are distinct variables. If they are distinct, why have you assembled them in the same column? – Chris Ruehlemann Mar 02 '20 at 10:42
  • And how are the `Pixel Qa`and `NDVI` values related/matched up--via the suffix number, so that, for example, `NDVI1`relates to `PixelQa1`? – Chris Ruehlemann Mar 02 '20 at 10:50
  • They are grouped like that because they are related to the very same date. I could split "date1" date into 2 columns (date1_NDVI and date2_NDVI). Do you think like that that would be more feasible? – Oriol Baena Crespo Mar 02 '20 at 10:51
  • Exactly Chris, by the suffix. – Oriol Baena Crespo Mar 02 '20 at 10:56
  • You could split up the data thus: `df_ndvi <- DataNDVI[grepl("NDVI", DataNDVI$Data), ] df_pixel <- DataNDVI[!grepl("NDVI", DataNDVI$Data), ]` – Chris Ruehlemann Mar 02 '20 at 10:57
  • Yes, or or simply `DataNDVI_split <- data.frame("21feb1987_NDVI" = c(0.123, NA, 0.192, 0.234, NA), "21feb1987_PixelQa" = c(66.30, NA, 66.00, 79.87, NA), "18jul1987_NDVI" = c(0.223, NA, 0.230, 0.334, NA), "21feb1987_PixelQa" = c(66.30, NA, 66.00, 79.87, NA), stringsAsFactors = FALSE)`. How would you proceed? – Oriol Baena Crespo Mar 02 '20 at 11:01

1 Answers1

0

Here's a tentative solution. First, I'd split up the data into two separate dataframes, thus:

df_ndvi <- DataNDVI[grepl("NDVI", DataNDVI$Data), ]
df_ndvi
   Data X21feb1987 X18jul1987
1 NDVI1      0.123      0.223
2 NDVI2         NA         NA
3 NDVI3      0.192      0.230
4 NDVI4      0.234      0.334
5 NDVI5         NA         NA

df_pixel <- DataNDVI[!grepl("NDVI", DataNDVI$Data), ]
df_pixel
       Data X21feb1987 X18jul1987
6  PixelQa1      66.30      66.00
7  PixelQa2         NA         NA
8  PixelQa3      66.00     124.23
9  PixelQa4      79.87      86.00
10 PixelQa5         NA         NA

To perform the desired changes, there are many possible ways. One way is by using a forloop through all the columns in df_ndvi (except the first!) and defining an ifelse statement to see whether or not the conditions hold true and to define actions to be taken in either case:

for(i in 2:3){
  df_ndvi[,i] <- ifelse(df_pixel[,i] < 65.5 | df_pixel[,i] > 66.5, NA, df_ndvi[,i])
}

This results in these corrections in df_ndvi:

df_ndvi
   Data X21feb1987 X18jul1987
1 NDVI1      0.123      0.223
2 NDVI2         NA         NA
3 NDVI3      0.192         NA
4 NDVI4         NA         NA
5 NDVI5         NA         NA

EDIT:

If you prefer to split-up the data in this way:

DataNDVI_split <- data.frame("21feb1987_NDVI" = c(0.123, NA, 0.192, 0.234, NA), "21feb1987_PixelQa" = c(66.30, NA, 66.00, 79.87, NA), "18jul1987_NDVI" = c(0.223, NA, 0.230, 0.334, NA), "21feb1987_PixelQa" = c(66.30, NA, 66.00, 79.87, NA), stringsAsFactors = FALSE)
DataNDVI_split
  X21feb1987_NDVI X21feb1987_PixelQa X18jul1987_NDVI X21feb1987_PixelQa.1
1           0.123              66.30           0.223                66.30
2              NA                 NA              NA                   NA
3           0.192              66.00           0.230                66.00
4           0.234              79.87           0.334                79.87
5              NA                 NA              NA                   NA

then the for loop could be adapted thus:

for(i in c(1,3)){
  DataNDVI_split[,i] <- ifelse(DataNDVI_split[,i+1] < 65.5 | DataNDVI_split[,i+1] > 66.5, NA, DataNDVI_split[,i])
}

The result is this:

DataNDVI_split
  X21feb1987_NDVI X21feb1987_PixelQa X18jul1987_NDVI X21feb1987_PixelQa.1
1           0.123              66.30           0.223                66.30
2              NA                 NA              NA                   NA
3           0.192              66.00           0.230                66.00
4              NA              79.87              NA                79.87
5              NA                 NA              NA                   NA
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • I think that is a good solution! Do you think I could adapt that code to perform the very same thing you did but based on a single dataframe, for example on `DataNDVI_split`? Thank you so much – Oriol Baena Crespo Mar 02 '20 at 11:51
  • See **EDIT** in my answer! – Chris Ruehlemann Mar 02 '20 at 12:25
  • Thank you very much Chris. That has been very very helpful. I updated my post (Check **EDIT**) adding more complexity to the whole thing, in this case with my very real dataset. – Oriol Baena Crespo Mar 02 '20 at 16:15
  • I don't quite understand your EDIT: is my solution the answer to your query or is it not? – Chris Ruehlemann Mar 02 '20 at 16:52
  • Anyhow Chris, the whole question has changed so much thanks to your comments and suggestions. I considered the initial question answered. I think I am misusing the question now and using it more as a forum where to ask doubts of my advances, which is not the scope this webpage. I am a new user and I am getting how the thing works now. Sincere apologies for that. – Oriol Baena Crespo Mar 03 '20 at 10:10
  • I've been reading the rules and good practices of the webpage and decided to write the question again so it fits your answer and will be available for the rest of the users. I will post other questions, if i need, on other elements of my script. Thanks again Chris! – Oriol Baena Crespo Mar 03 '20 at 13:44