7

I posted an answer to a question using dplyr and tidyr. Based on this comment I used Map to build the answer.

Next I tried to use base R tools only to answer the same question, but this didn't work as expected:

transform(
  df,
  Begin_New = Map(seq, Begin, End - 6000, list(by = 1000)) # or mapply(...)
)

caused an error:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : Arguments imply different number of rows: 25, 33, 84, 36, 85, 165

Well, okay. That doesn't seem to work, but why does this one work?

df2 <- data.frame(id = 1:4, nested = c("a, b, f", "c, d", "e", "e, f"))
transform(df2, nested = strsplit(nested, ", "))

In my understanding Map(seq, Begin, End - 6000, list(by = 1000)) and strsplit(nested, ", ") both return a list() containing vectors. What am I missing?

I read this question Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : Arguments imply different number of rows: 1, 4, 5, 2 but still I don't know, why these two examples behave differently.

Data

df <- structure(list(ID = c("A01", "A01", "A01", "A01", "A01", "A01"
), Period = c("Baseline", "Run", "Recovery", "Baseline", "Run", 
"Recovery"), Begin = c(0, 30500, 68500, 2000, 45000, 135000), 
    End = c(30500, 68500, 158000, 43000, 135000, 305000)), row.names = c(NA, 
-6L), class = "data.frame")
Martin Gal
  • 16,640
  • 5
  • 21
  • 39

2 Answers2

7

I think it is related to Create a data.frame where a column is a list. So use I to Inhibit Interpretation/Conversion of Objects.

transform(
  df,
  Begin_New = I(Map(seq, Begin, End - 6000, list(by = 1000)))
)

Another way would be to use list2DF like.

transform(
  df,
  unusedName = list2DF(list(Begin_New = Map(seq, Begin, End - 6000,
                 list(by = 1000))))
)

As already pointed out by @r2evans. In the first case you create a new column, in the second you overwrite an existing one.

GKi
  • 37,245
  • 2
  • 26
  • 48
  • 1
    Very interesting that this works! The *only* difference (inside `transform.data.frame`) between with and without `I(.)` is the `class = "AsIs"` on the `Begin_New` `list`. – r2evans Sep 01 '21 at 12:42
  • But it makes sense, compare `data.frame(a=1:3, b = list(1, 2, 3))` to `data.frame(a=1:3, b = I(list(1 ,2, 3)))`. – thothal Sep 01 '21 at 12:48
  • 1
    `?data.frame`: If a list or data frame or matrix is passed to ‘data.frame’ it is as if each component or column had been passed as a separate argument (except for matrices protected by ‘I’). – thothal Sep 01 '21 at 12:48
  • Easy fix for an issue I still don't fully understand. Thank you. – Martin Gal Sep 01 '21 at 16:44
5

The error appears to be in transform.data.frame and how it is (re)assigning the column.

transform.data.frame
# function (`_data`, ...) 
# {
#     e <- eval(substitute(list(...)), `_data`, parent.frame())
#     tags <- names(e)
#     inx <- match(tags, names(`_data`))
#     matched <- !is.na(inx)
#     if (any(matched)) {
#         `_data`[inx[matched]] <- e[matched]
#         `_data` <- data.frame(`_data`)
#     }
#     if (!all(matched)) 
#         do.call("data.frame", c(list(`_data`), e[!matched]))
#     else `_data`
# }
# <bytecode: 0x000000000a34e4b0>
# <environment: namespace:base>

Specifically, if any(matched) then it uses

`_data`[inx[matched]] <- e[matched]

which works. This is the case in your df2 example, because you reassign over an existing variable, nested. If you chose to assign to a non-existent variable, however, it also fails:

transform(df2, nested2 = strsplit(nested, ", "))
# Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE,  : 
#   arguments imply differing number of rows: 3, 2, 1

If the column does not exist (as is the case in the original df), then

do.call("data.frame", c(list(`_data`), e[!matched]))

fails.

If we pre-assign df$Begin_New, it works.

df$Begin_New <- NA
str(transform(
  df,
  Begin_New = Map(seq, Begin, End - 6000, by = 1000) # or mapply(...)
))
# 'data.frame': 6 obs. of  5 variables:
#  $ ID       : chr  "A01" "A01" "A01" "A01" ...
#  $ Period   : chr  "Baseline" "Run" "Recovery" "Baseline" ...
#  $ Begin    : num  0 30500 68500 2000 45000 135000
#  $ End      : num  30500 68500 158000 43000 135000 305000
#  $ Begin_New:List of 6
#   ..$ : num  0 1000 2000 3000 4000 5000 6000 7000 8000 9000 ...
#   ..$ : num  30500 31500 32500 33500 34500 35500 36500 37500 38500 39500 ...
#   ..$ : num  68500 69500 70500 71500 72500 73500 74500 75500 76500 77500 ...
#   ..$ : num  2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 ...
#   ..$ : num  45000 46000 47000 48000 49000 50000 51000 52000 53000 54000 ...
#   ..$ : num  135000 136000 137000 138000 139000 140000 141000 142000 143000 144000 ...

Perhaps this is a bug in transform.data.frame, it does seem odd to have the inconsistent behavior due solely to the (discarded) preexistence of the column. If we change the new-variable assignment to something like this:

transform2 <- function (`_data`, ...) {
    e <- eval(substitute(list(...)), `_data`, parent.frame())
    tags <- names(e)
    inx <- match(tags, names(`_data`))
    matched <- !is.na(inx)
    if (any(matched)) {
        `_data`[inx[matched]] <- e[matched]
        `_data` <- data.frame(`_data`)
    }
    if (!all(matched))  {
        `_data`[ncol(`_data`) + seq_len(sum(!matched))] <- e[!matched]
        `_data` <- data.frame(`_data`)
    }
    `_data`
}

Then it works. (I have not tested for everything else transform.data.frame is supposed to handle, but perhaps this should be a bug-report/patch-request to R-devel.)

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • 3
    From the documentation: "If some of the values are not vectors of the appropriate length, you deserve whatever you get!" – Roland Sep 01 '21 at 12:37
  • That's a funny note in that doc, yes ... it seems to be from a time before list-columns were acceptable. (Are they formerly "acceptable" in base R? I know they generally *work*, but ... – r2evans Sep 01 '21 at 12:38
  • 1
    Thank you, @r2evans for this enlightening answer. Learnt something new today. :-) – Martin Gal Sep 01 '21 at 17:06
  • This is how we often self-improve: learn to permute a method into something else and watch how things tumble/break. – r2evans Sep 01 '21 at 17:10