0

I have a data frame containing information about diameter measures from different trees (column "t"), each tree having different number of stems (column "s1"). In the first record, all alive stems (column "flag1") are recorded, resulting in the following data frame:

df1

t   s1  d1  flag1
t1  a   2   alive
t1  b   3   alive
t1  c   2   alive
t2  a   4   alive
t2  b   3   alive
t2  c   7   alive
t3  a   3   alive
t3  b   5   alive
t4  a   4   alive
t4  b   3   alive

As trees grow each year the same stem diameter is recorded again for each tree, generating a new data frame (df2) with the new diameter measures. Moreover, in the following years, trees may have their stems alive (e.g. "t3"), gain new stems (e.g. "t2"), lose stems (e.g. "t1") or all these combinations (e.g. "t4"):

df2

t   s2  d2  flag2
t1  a   3   alive
t1  b   4   alive
t1  c   NA  dead
t2  a   5   alive
t2  b   3   alive
t2  c   7   alive
t2  d   3   new
t2  e   4   new
t3  a   4   alive
t3  b   8   alive
t4  a   5   alive
t4  b   NA  dead
t4  c   3   new

I need to create a new data frame with a shared column ("t") while keeping the remaining columns of each dataframe and fill empty cells with NA's. In this case, the final data frame would be like this:

df3

t   s1  d1  flag1   s2  d2  flag2
t1  a   2   alive   a   3   alive
t1  b   3   alive   b   4   alive
t1  c   2   alive   c   NA  dead
t2  a   4   alive   a   5   alive
t2  b   3   alive   b   3   alive
t2  c   7   alive   c   7   alive
t2  NA  NA  NA      d   3   new
t2  NA  NA  NA      e   4   new
t3  a   3   alive   a   4   alive
t3  b   5   alive   b   8   alive
t4  a   4   alive   a   5   alive
t4  b   3   alive   b   NA  dead
t4  NA  NA  NA      c   3   new

I tried functions like cbind.fill (package:rowr) but I wasn't able to find a solution.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Diego
  • 31
  • 4
  • Btw, I could not reproduce the desired result with the suggestions given in the answer of http://stackoverflow.com/questions/1299871/how-to-join-data-frames-in-r-inner-outer-left-right - so, it's possibly not a duplicate? – Daniel Apr 14 '15 at 20:05

1 Answers1

1

Here is a dplyr-solution, or better: hack.

zz1 <- "t   s1  d1  flag1
t1  a   2   alive
t1  b   3   alive
t1  c   2   alive
t2  a   4   alive
t2  b   3   alive
t2  c   7   alive
t3  a   3   alive
t3  b   5   alive
t4  a   4   alive
t4  b   3   alive"
df1 <- read.table(text = zz1, header = T)
zz2 <- "t   s2  d2  flag2
t1  a   3   alive
t1  b   4   alive
t1  c   NA  dead
t2  a   5   alive
t2  b   3   alive
t2  c   7   alive
t2  d   3   new
t2  e   4   new
t3  a   4   alive
t3  b   8   alive
t4  a   5   alive
t4  b   NA  dead
t4  c   3   new"
df2 <- read.table(text = zz2, header = T)

# dummy data frame w/o new flags
df2_a <- dplyr::filter(df2, flag2 != "new")
# bind columns
df3 <- dplyr::bind_cols(df1, df2_a)
# add new flags and sort by "t"
df3 <- dplyr::bind_rows(df3, dplyr::filter(df2, flag2 == "new")) %>% dplyr::arrange(t)
Daniel
  • 7,252
  • 6
  • 26
  • 38