11

I produced a large data frame (1700+obs,159 variables) with a function that collects info from a website. Usually, the function finds numeric values for some columns, and thus they're numeric. Sometimes, however, it finds some text, and converts the whole column to text.

I have one df whose column classes are correct, and I would like to "paste" those classes to a new, incorrect df.

Say, for example:

dfCorrect<-data.frame(x=c(1,2,3,4),y=as.factor(c("a","b","c","d")),z=c("bar","foo","dat","dot"),stringsAsFactors = F)
str(dfCorrect)
'data.frame':   4 obs. of  3 variables:
 $ x: num  1 2 3 4
 $ y: Factor w/ 4 levels "a","b","c","d": 1 2 3 4
 $ z: chr  "bar" "foo" "dat" "dot"

## now I have my "wrong" data frame:
dfWrong<-as.data.frame(sapply(dfCorrect,paste,sep=""))
str(dfWrong)
'data.frame':   4 obs. of  3 variables:
 $ x: Factor w/ 4 levels "1","2","3","4": 1 2 3 4
 $ y: Factor w/ 4 levels "a","b","c","d": 1 2 3 4
 $ z: Factor w/ 4 levels "bar","dat","dot",..: 1 4 2 3

I wanted to copy the classes of each column of dfCorrect into dfWrong, but haven't found how to do it properly. I've tested:

dfWrong1<-dfWrong
dfWrong1[0,]<-dfCorrect[0,]
str(dfWrong1) ## bad result
'data.frame':   4 obs. of  3 variables:
 $ x: Factor w/ 4 levels "1","2","3","4": 1 2 3 4
 $ y: Factor w/ 4 levels "a","b","c","d": 1 2 3 4
 $ z: Factor w/ 4 levels "bar","dat","dot",..: 1 4 2 3

dfWrong1<-dfWrong
str(dfWrong1)<-str(dfCorrect)
'data.frame':   4 obs. of  3 variables:
 $ x: num  1 2 3 4
 $ y: Factor w/ 4 levels "a","b","c","d": 1 2 3 4
 $ z: chr  "bar" "foo" "dat" "dot"
Error in str(dfWrong1) <- str(dfCorrect) : 
  could not find function "str<-"

With this small matrix I could go by hand, but what about larger ones? Is there a way to "copy" the classes from one df to another without having to know the individual classes (and indexes) of each column?

Expected final result (after properly "pasting" classes):

all.equal(sapply(dfCorrect,class),sapply(dfWrong,class))
[1] TRUE
zx8754
  • 52,746
  • 12
  • 114
  • 209
PavoDive
  • 6,322
  • 2
  • 29
  • 55
  • 2
    You're solving the problem at the wrong step. Why not just automatically specify what class to convert the output to when you're reading from the URL? – Oliver Keyes Dec 08 '14 at 15:27
  • 1
    Agreed, @Oliver. I just did that for future data; but as now I have a very large df with the wrong classes. Thanks! – PavoDive Dec 09 '14 at 13:05

1 Answers1

14

You could try this:

dfWrong[] <- mapply(FUN = as,dfWrong,sapply(dfCorrect,class),SIMPLIFY = FALSE)

...although my first instinct is to agree with Oliver that if it were me I'd try to ensure the correct class at the point you're reading the data.

joran
  • 169,992
  • 32
  • 429
  • 468
  • Thanks @Joran, As I said to @Oliver, I agree to his comment too. Two things from your answer that I think are key, and I didn't know: the function `as`! I use pretty often the `as.**` family, but didn't even think there was a "generic" function for that. The other one is `SIMPLIFY=F"` to `mapply`: before now I wondered when one would want to *not* simplify the result. Thanks! – PavoDive Dec 09 '14 at 13:06
  • @joran, Hi, I had a similar requirement, but in my case the correct class is stored in a vector say "variable_class". I updated dfCorrect with the vector name but it converts all the variables to character. Can't we pass a vector in this case? Please suggest. – user1412 Nov 25 '16 at 10:18
  • @joran, I was hoping you would be able to provide some expert suggestions to be query above. Thank you!! – user1412 Nov 25 '16 at 17:01