I want to do a 1:1 matching using a subset of my data, and then add the output code to my original data as a new column. Here is a working example using sample data:
mydata <- iris
dfrm <- subset(mydata, mydata$Petal.Length>4)
library(e1071)
m <- matchControls(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,
data = dfrm, caselabel = "versicolor", contlabel = "virginica")
The output has the original row numbers in it, which I want to use when appending to the original data.
m$factor
# 51 52 53 55 56 57 59 62 64 66 67 68 69 71 73 74 75 76 77
# case case case case case case case case case case case case case case case case case case case
# 78 79 84 85 86 87 88 89 91 92 95 96 97 98 100 101 102 103 104
# case case case case case case case case case case case case case case case <NA> cont <NA> cont
# 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123
# cont <NA> cont <NA> cont <NA> cont cont cont cont cont cont cont <NA> <NA> cont <NA> cont <NA>
# 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142
# cont cont <NA> cont cont cont <NA> <NA> <NA> cont cont cont <NA> cont cont cont cont cont cont
# 143 144 145 146 147 148 149 150
# cont <NA> <NA> cont cont cont cont cont
When I try to add it directly to the original data as a new column, I receive an error because of differing row numbers:
mydata$output <- m$factor
# Error in `$<-.data.frame`(`*tmp*`, output, value = c(1L, 1L, 1L, 1L, 1L, :
# replacement has 84 rows, data has 150
My search attempts failed perhaps failed because I don't know how to describe my problem in the correct terminology. I tried "merge dataframes by rows", etc., and what I got did not seem relevant. Some auto-suggested duplicates like this one are about adding aggregate results back to the original data, which is not the case here. I tried using join
based on this answer, but I don't know how to define the argument by
as the row number, as opposed to an actual variable.
library(dplyr)
left_join(mydata, as.data.frame(m$factor), by=NULL)
# Error: `by` required, because the data sources have no common variables
I tried cbind, but it also throws an error because of differing row numbers.
cbind(mydata, m$factor)
cbind(mydata, as.data.frame(m$factor))
# Error in data.frame(..., check.names = FALSE) :
# arguments imply differing number of rows: 150, 84
What am I missing? Thanks.