2

I'm trying to merge/bind two datasets (mydata_103 and mydata_17). They have exactly the same variable names, however I get 4 of these warning messages

Warning messages:
1: In `[<-.factor`(`*tmp*`, ri, value = c(1, 1, 2, 1, 1, 1, 1, 1, 5,  :
invalid factor level, NA generated

This seems to be caused by the fact that some variables have different classes. For example, I have a variable "gender" (1 = male, 2 = female). In the merged dataset, I do see value labels for mydata_17, however for the other dataset I get NA's. When I checked the classes, R returned they are different (I don't know why this is the case though?)

 > lapply(mydata_103[7], class)
$prgesl
[1] "numeric"

> lapply(mydata_17[7], class)
$prgesl
[1] "factor"

I changed the class of mydata_103 to factor

mydata_103$prgesl <- as.factor(mydata_103$prgesl)

Now, I do get the numeric values, but it still doesn't translate to the value labels:

    prgesl
15     Man
16     Man
17   Vrouw
18       2
19       2
20       1
21       2

Does anyone know how to fix this? And is there a way to get the classes for my two datasets the same or check which ones differ? (I have 404 variables so to check this by visual inspection seems ineffecient and prone to errors).

Best, Hanneke

Edit: The code to merge my datasets right now is simply:

data1 <- rbind.data.frame(mydata_17, mydata_103)
zx8754
  • 52,746
  • 12
  • 114
  • 209
Hannie
  • 417
  • 5
  • 17
  • 1
    i'd change to numeric first in both and after the `rbind()` back to factor. – mtoto Aug 14 '17 at 13:40
  • Okay, but that leaves with me just the numeric values, but the value labels are easier for interpretation - is there a way to keep the value labels? – Hannie Aug 14 '17 at 13:44
  • Convert factor columns to character then rbind, see [here](https://stackoverflow.com/a/2853231/680068) to convert only factor columns. – zx8754 Aug 14 '17 at 13:53

2 Answers2

1

Following mtoto's suggestion you want to first convert everything to numeric, then use the levels() function to turn the numbers into labels.

 mydata_17$prgesl <- as.numeric(mydata_17$prgesl)
 mydata<- rbind(mydata_17,mydata_103)
 labels <- levels(mydata_103$prgesl)
 mydata_103$prgesl <-labels[mydata_103$prgesl]

levels() should return the factor's names respecting the order given by the numbers.

Andrew Brēza
  • 7,705
  • 3
  • 34
  • 40
JMenezes
  • 1,004
  • 1
  • 6
  • 13
1

Convert factor columns to character then rbind, example:

# reproducible data
set.seed(1)
df1 <- data.frame(x = 1:3, y = runif(3))
df2 <- data.frame(x = letters[2:4], y = runif(3))

# below rbind will introduce NAs
rbind.data.frame(df2, df1)
# x         y
# 1    b 0.9082078
# 2    c 0.2016819
# 3    d 0.8983897
# 4 <NA> 0.2655087
# 5 <NA> 0.3721239
# 6 <NA> 0.5728534
# Warning message:
#   In `[<-.factor`(`*tmp*`, ri, value = 1:3) :
#   invalid factor level, NA generated

# Convert factors to character
i <- sapply(df1, is.factor)
df1[i] <- lapply(df1[i], as.character)
i <- sapply(df2, is.factor)
df2[i] <- lapply(df2[i], as.character)

# now bind
res <- rbind.data.frame(df2, df1)

str(res)
# 'data.frame': 6 obs. of  2 variables:
#   $ x: chr  "b" "c" "d" "1" ...
# $ y: num  0.908 0.202 0.898 0.266 0.372 ...

res
#   x         y
# 1 b 0.9082078
# 2 c 0.2016819
# 3 d 0.8983897
# 4 1 0.2655087
# 5 2 0.3721239
# 6 3 0.5728534
zx8754
  • 52,746
  • 12
  • 114
  • 209