0

I have Landsat data for 31 years in 6 NetCDF files.

Each files has about 4 million rows of data.

Each file has an as yet unknown number of timesteps in each

Except for the first file that I am using to write my script, which has 60 timesteps, and will be used to process all the files.

I have never used R before this wee exercise.

My task is to create one set of land only data for a statistician to work on.

How can I loop through the timesteps calculating ndwi?

I have this code working: But it requires a lot of find/replace, copy/paste to work through the 60 timesteps and then there are 5 more files to process.

###############################################################
# Calculate ndwi for for the each timestep
###############################################################
#convert the timestep_1 for green to numeric value
green_nir_df02$t_1 <- as.numeric(as.character(green_nir_df02$t_1))
head (na.omit(green_nir_df02$t_1, 20))

#convert the timestep_1 for nir to numeric value
green_nir_df02$t_1.1 <- as.numeric(as.character(green_nir_df02$t_1.1))
head (na.omit(green_nir_df02$t_1.1, 20))

# calculate green minus nir for timestep_1
grnMinusNir_t_1 <- green_nir_df02$t_1 - green_nir_df02$t_1.1
head(grnMinusNir_t_1, 20)

# calculate green plus nir for timestep_1
grnPlusNir_t_1 <- green_nir_df02$t_1 + green_nir_df02$t_1.1
head(grnPlusNir_t_1, 20)

# calculate ndwi from greenMinusNir divided by greenPlusNir for timestep_1
ndwi_t_1 <- grnMinusNir_t_1 / grnPlusNir_t_1
head(ndwi_t_1, 20)

# write ndwi to the green_nir_df02 dataframe for timestep_1
green_nir_df02$ndwi_t_1 <- ndwi_t_1

####################################################
#convert the timestep_2 for green to numeric value
green_nir_df02$t_2 <- as.numeric(as.character(green_nir_df02$t_2))
head (na.omit(green_nir_df02$t_2, 20))

#convert the timestep_2 for nir to numeric value
green_nir_df02$t_2.1 <- as.numeric(as.character(green_nir_df02$t_2.1))
head (na.omit(green_nir_df02$t_2.1, 20))

# calculate green minus nir for timestep_2
grnMinusNir_t_2 <- green_nir_df02$t_2 - green_nir_df02$t_2.1
head(grnMinusNir_t_2, 20)

# calculate green plus nir for timestep_2
grnPlusNir_t_2 <- green_nir_df02$t_2 + green_nir_df02$t_2.1
head(grnPlusNir_t_2, 20)

# calculate ndwi from greenMinusNir divided by greenPlusNir for timestep_2
ndwi_t_2 <- grnMinusNir_t_2 / grnPlusNir_t_2
head(ndwi_t_2, 20)

# write ndwi to the green_nir_df02 dataframe for timestep_2
green_nir_df02$ndwi_t_2 <- ndwi_t_2

... etc to t_60

###############################################################
# END: Calculate ndwi for for the each timestep
###############################################################

and the next file starts at t_61 and so on

I have tried using this loop, which is not working because I can't point to the value in the column of t_1 in the dataframe rather than the value in the array of column names.

# initialize the "for" loop to calculate ndwi columns
seq <- 1:nt
i <- 0
green_t <- array(1:nt)
nir_t <- array(1:nt)
ndwi_t <- array(1:nt)
greenMinusNir <- array(1:nt)
greenPlusNir <- array(1:nt)
ndwi <- array(1:nt)
# set the value of the count
# count = nt from previous file #nt = num timesteps#THIS IS THE VALUE TO USE
count <- 0 ################################## EDIT THIS VALUE BEFORE RUNNING
# START: Loop for each timestep
# Step 1: create variable names for green, nir, ndwi
# Step 2: convert t_[i] and t_[i].1 to numeric
# Step 3: calculate green minus nir
# Step 4: calculate green plus nir
# Step 5: calculate ndwi = g-nir/g+nir
# Step 6: write ndwi to the green_nir_df02 dataframe
for(i in seq){
    # Step 1: create variable names for green, nir, ndwi
    green_t[i] <- paste("green_nir_df02$t_",count+i,sep="")
    nir_t[i] <- paste("green_nir_df02$t_",count+i,".1",sep="")
    ndwi_t[i] <- paste("green_nir_df02$ndwi_t_",count+i,sep="")
}
# initialize the "for" loop to calculate ndwi columns
seq <- 1:nt
i <- 0
for(i in seq){
    # Step 2: convert green (i.e., t_[i]) and nir (i.e., t_[i].1) to numeric
    green <- as.numeric(as.character(green_t[i]))
    ##    
    #ERROR when i=1 > green <-as.numeric(as.character("green_nir_df02$t_1"))
    ## 
    nir_t <- as.numeric(as.character(nir_t))

    # Step 3: calculate green minus nir
    greenMinusNir[i] <- green_t - nir_t

    # Step 4: calculate green plus nir
    greenPlusNir[i] <- green_t[i] + nir_t[i]

    # Step 5: calculate ndwi = g-nir/g+nir
    ndwi[i] <- greenMinusNir[i] / greenPlusNir[i]

    # Step 6: write ndwi to the green_nir_df02 dataframe ndwi timestep
    ndwi_t[i] <- ndwi[i]
    # i <- i+1
}

Sample data

            lon_lat t_1 t_2 t_60 t_1.1 t_2.1  t_60.1  ndwi_t_1   ndwi_t_2

1 -1609787.5_-2275087.5 180 216 247 80 197 192 0.3846154 0.04600484

2 -1609762.5_-2275087.5 102 252 281 80 197 227 0.1208791 0.12249443

3 -1609737.5_-2275087.5 102 216 281 80 156 227 0.1208791 0.16129032

4 -1609712.5_-2275087.5 141 216 281 80 156 227 0.2760181 0.16129032

5 -1609687.5_-2275087.5 180 181 281 80 156 227 0.3846154 0.07418398

6 -1609662.5_-2275087.5 180 216 281 80 197 227 0.3846154 0.04600484

Ilona
  • 31
  • 6
  • 1
    Welcome to Stack Overflow! Could you make your problem reproducible by sharing a sample of your data so others can help (please do not use `str()`, `head()` or screenshot)? You can use the [`reprex`](https://reprex.tidyverse.org/articles/articles/magic-reprex.html) and [`datapasta`](https://cran.r-project.org/web/packages/datapasta/vignettes/how-to-datapasta.html) packages to assist you with that. See also [Help me Help you](https://speakerdeck.com/jennybc/reprex-help-me-help-you?slide=5) & [How to make a great R reproducible example?](https://stackoverflow.com/q/5963269) – Tung Nov 19 '18 at 09:04
  • Writing the question in jargon that 0.01% of users would understand is going to reduce the responses. 'NDWI' apparently means [Normalized Difference Water Index](https://en.wikipedia.org/wiki/Normalized_difference_water_index), computed at each timestep t. So that boils down to you want to vectorize one calculation on all rows of a dataframe. Although you said the way the timesteps are represented causes an issue. Please explain more, i.e. post a snippet of (say 10-40) actual lines from your dataframe. – smci Nov 19 '18 at 09:23
  • *"How can I loop through the timesteps calculating NDWI?"* **Don't**. Vectorize them: write a custom function `calculate_ndwi(...)` then use `apply()` function to run it on all the rows and columns you need to. There should be zero for-loops in your final answer. Also, your timesteps should not be string to begin with, they should be read in directly as numeric from your netCDF file, or else converted immediately after reading. But, please show us a snippet of your input data, that's the best startpoint. – smci Nov 19 '18 at 09:31
  • Thank you. Vectors have solved my problems. – Ilona Nov 20 '18 at 03:01

0 Answers0