3

I am using tidyr and lubridate to convert a wide table to a long table. The following works just fine.

> (df <- data.frame(hh_id = 1:2,
                   bday_01 = ymd(20150309),
                   bday_02 = ymd(19850911),
                   bday_03 = ymd(19801231)))

  hh_id    bday_01    bday_02    bday_03
1     1 2015-03-09 1985-09-11 1980-12-31
2     2 2015-03-09 1985-09-11 1980-12-31

> gather(df, person_num, bday, starts_with("bday_0"))

  hh_id  person_num        bday
1     1     bday_01  2015-03-09
2     2     bday_01  2015-03-09
3     1     bday_02  1985-09-11
4     2     bday_02  1985-09-11
5     1     bday_03  1980-12-31
6     2     bday_03  1980-12-31

However, when there are NA's in the mix, the dates are converted to strings.

> (df <- data.frame(hh_id = 1:2,
                   bday_01 = ymd(20150309),
                   bday_02 = ymd(19850911),
                   bday_03 = NA))

  hh_id    bday_01    bday_02    bday_03
1     1 2015-03-09 1985-09-11         NA
2     2 2015-03-09 1985-09-11         NA

> gather(df, person_num, bday, starts_with("bday_0"))

  hh_id person_num       bday
1     1    bday_01 1425859200
2     2    bday_01 1425859200
3     1    bday_02  495244800
4     2    bday_02  495244800
5     1    bday_03         NA
6     2    bday_03         NA
Warning message:
attributes are not identical across measure variables; they will be dropped 

Note that there is still a warning when regular strings are mixed with NA's as well.

> (df <- data.frame(hh_id = 1:2,
                    bday_01 = '20150309',
                    bday_02 = '19850911',
                    bday_03 = NA))

  hh_id  bday_01  bday_02 bday_03
1     1 20150309 19850911      NA
2     2 20150309 19850911      NA

> gather(df, person_num, bday, starts_with("bday_0"))

  hh_id person_num     bday
1     1    bday_01 20150309
2     2    bday_01 20150309
3     1    bday_02 19850911
4     2    bday_02 19850911
5     1    bday_03     <NA>
6     2    bday_03     <NA>
Warning message:
attributes are not identical across measure variables; they will be dropped 

Is it possible to use tidyr with NA's while avoiding a warning and retaining formatting?

josiekre
  • 795
  • 1
  • 7
  • 19
  • can you use base reshape? it doesn't have that problem `reshape(df, idvar = 'hh_id', varying = list(2:4), v.names = 'bday', direction = 'long', timevar = 'person_num')` – rawr Mar 10 '15 at 01:22

1 Answers1

2

The data is not being converted to strings, it is dropping back to the integer representation of the seconds since 1970-01-01, which is what the original Date values in df represent:

x <- df$bday_01
x
#[1] "2015-03-09 UTC" "2015-03-09 UTC"
attributes(x) <- NULL
x
#[1] 1425859200 1425859200

The warning message gives you a hint to a way around it:

attributes are not identical across measure variables; they will be dropped

So, try:

attributes(df$bday_03) <- attributes(df$bday_02)
gather(df, person_num, bday, starts_with("bday_0"))

#  hh_id person_num       bday
#1     1    bday_01 2015-03-09
#2     2    bday_01 2015-03-09
#3     1    bday_02 1985-09-11
#4     2    bday_02 1985-09-11
#5     1    bday_03       <NA>
#6     2    bday_03       <NA>
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • Ah okay. Thoughts on how I would systematically assign attributes to all NA's? In other words, what if bday_03 only has one NA and bday_02 has the opposite NA? – josiekre Mar 10 '15 at 02:04
  • @josiekre - the problem should only exist when you have all NA's without any valid dates in a variable. Therefore no proper Date/Time attributes are set for that variable. Having an NA interspersed between valid dates in bday_03 and/or bday_02 won't break anything I don't think. – thelatemail Mar 10 '15 at 02:11
  • It does break for some reason. That's how my nontrivial example is: interspersed NAs in a lubridate column. – josiekre Mar 10 '15 at 03:08
  • @josiekre - I can't replicate the problem, e.g.: `(df <- data.frame(hh_id = 1:2,bday_01 = ymd(20150309,NA),bday_02 = ymd(NA,19850911),bday_03 = NA))` still works with this method. – thelatemail Mar 10 '15 at 03:10
  • Odd. That `df` doesn't work for me. It drops the attributes and gives a warning. R is version 3.1.2; lubridate 1.3.3; tidyr 0.2.0 – josiekre Mar 10 '15 at 15:40
  • I didn't realize there was an empty column in your explain above. I take back my comment; it does work when I assign attributes to the all NA column. – josiekre Mar 10 '15 at 18:32