I should know this, but I don't. And that's because factors in R can be an absolute nightmare. This is a follow-up to my previous question. I'm hoping a few of you might be able to explain in a bit more detail than the R manuals about how to preserve the column attributes when passing a data frame to a custom function. So far, the most useful information I've dug up was from Hadley's Advanced R Programming site. But that section is quite short. Here's what I have:
Edits: I've added the source code to my GitHub (EDIT: link goes to gsub.dataframe.R
now). Also, I think I may have a good way to determine whether to set stringsAsFactors = FALSE
in the new data frame. Or, as a much easier alternative, I could add a stringsAsFactors
argument. Is it possible to use ...
for more than one set of further arguments? Like having ...
be the further arguments to grep
anddata.frame
?
Set up some data
set.seed(24)
num <- rep(1, 10); int <- 1:10; fac <- sample(LETTERS[1:3], 10, TRUE)
D <- data.frame(num, int, fac); D$char <- as.character(letters[1:10])
Here's a call to the custom function, and the result.
(newD <- grep.dataframe("6|(a|f)", D, sub = "XXX", ignore.case = TRUE))
# num int fac char
# 1 1 1 XXX XXX
# 2 1 2 B b
# 3 1 3 C c
# 4 1 4 XXX d
# 5 1 5 XXX e
# 6 1 XXX C XXX
# 7 1 7 XXX g
# 8 1 8 B h
# 9 1 9 B i
# 10 1 10 XXX j
I haven't done anything, but have tried everything I can think of, to preserve as much information about the columns as I can (i.e. class(x) <-
, attr(x, "name") <-
, attributes(x) <-
, I(x)
, etc.). The result you see above is absolutely correct as it reads. However, the result below is troubling. I could use a little help with getting the final data structure to match the original data structure. I'm thinking a switch
statement might do the trick?
Note that
> args(grep.dataframe)
function (pattern, X, sub = NULL, ...)
NULL
with the sub
argument calling gsub
when not NULL
As always, I appreciate the help.
Note : I took the advice of Hadley (why wouldn't you?) and split this into two functions. My answer below is a new function that only calls gsub
for regular expression matching.