-1

R newbie (ish). I've written some code which uses a for() loop in R. I want to rewrite it in a vectorised form, but it's not working.

Simplified example to illustrate:

library(dplyr)

x <- data.frame(name = c("John", "John", "John", "John", "John", "John", "John", "John", "Fred", "Fred"),
                year = c(1, NA, 2, 3, NA, NA, 4, NA, 1, NA))

## if year is blank and name is same as name from previous row
##    take year from previous row
## else
##    stick with the year you already have

# 1. Run as a loop

x$year_2 <- NA
x$year_2[1] <- x$year[1]                

for(row_idx in 2:10)
{
  if(is.na(x$year[row_idx]) & (x$name[row_idx] == x$name[row_idx - 1]))
  {
    x$year_2[row_idx] = x$year_2[row_idx - 1]
  }
  else
  {
    x$year_2[row_idx] = x$year[row_idx]
  }
}  

# 2. Attempt to vectorise

x <- data.frame(name = c("John", "John", "John", "John", "John", "John", "John", "John", "Fred", "Fred"),
                year = c(1, NA, 2, 3, NA, NA, 4, NA, 1, NA))

x$year_2 <- ifelse(is.na(x$year) & x$name == lead(x$name),
                   lead(x$year_2),
                   x$year)

I think the vectorised version is being messed up because there's a circularity to it (ie x$year_2 appears on both sides of the <- ). Is there a way to get around this?

Thank you.

Alan
  • 619
  • 6
  • 19

4 Answers4

4

I recommend that you use the already established functions, R feels difficult at the start because we are trained to reinvent wheels, don't do it.

library(tidyverse)

x <- data.frame(name = c("John", "John", "John", "John", "John", "John", "John", "John", "Fred", "Fred"),
                year = c(1, NA, 2, 3, NA, NA, 4, NA, 1, NA))


x %>% 
  group_by(name) %>% 
  tidyr::fill(year)
Bruno
  • 4,109
  • 1
  • 9
  • 27
  • Oh my...that's much better! The ordering is altered - presumably I could use a %>% arrange() to fix that (doesn't necessarily affect my code logic but will make things easier to look at on screen)? – Alan Dec 20 '19 at 14:20
  • 1
    Yes, it will only change what is displayed, still if it looks better for you go for it – Bruno Dec 20 '19 at 14:36
1

If you are using dplyr/tidyverse:

library(dplyr)
library(tidyr)
x %>% 
  group_by(name) %>% 
  fill("year")

   name   year
   <fct> <dbl>
 1 John      1
 2 John      1
 3 John      2
 4 John      3
 5 John      3
 6 John      3
 7 John      4
 8 John      4
 9 Fred      1
10 Fred      1
s_baldur
  • 29,441
  • 4
  • 36
  • 69
0

If you know that the data frame is always in this type of ordering, then the following should work for you by filling the NAs with the most recent non-missing value.

library(zoo)
x <- data.frame(name = c("John", "John", "John", "John", "John", "John", "John", "John", "Fred", "Fred"),
                year = c(1, NA, 2, 3, NA, NA, 4, NA, 1, NA))
x$year_2 <- na.locf(x$year)
x

If you don't want to load the zoo package, this works as well:

repeat_last = function(x, forward = TRUE, maxgap = Inf, na.rm = FALSE) {
  if (!forward) x = rev(x)           # reverse x twice if carrying backward
  ind = which(!is.na(x))             # get positions of nonmissing values
  if (is.na(x[1]) && !na.rm)         # if it begins with NA
    ind = c(1,ind)                 # add first pos
  rep_times = diff(                  # diffing the indices + length yields how often
    c(ind, length(x) + 1) )          # they need to be repeated
  if (maxgap < Inf) {
    exceed = rep_times - 1 > maxgap  # exceeding maxgap
    if (any(exceed)) {               # any exceed?
      ind = sort(c(ind[exceed] + 1, ind))      # add NA in gaps
      rep_times = diff(c(ind, length(x) + 1) ) # diff again
    }
  }
  x = rep(x[ind], times = rep_times) # repeat the values at these indices
  if (!forward) x = rev(x)           # second reversion
  x
}

x$year_3 <- repeat_last(x$year)
x
ErrorJordan
  • 611
  • 5
  • 15
0

An easy way in base R to do this can be implemented via the code below

x <- within(x, year <- subset(year,!is.na(year))[cumsum(!is.na(year))])

or

x$year <- with(x, subset(year,!is.na(year))[cumsum(!is.na(year))])

such that

> x
   name year
1  John    1
2  John    1
3  John    2
4  John    3
5  John    3
6  John    3
7  John    4
8  John    4
9  Fred    1
10 Fred    1
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81