I need to add some missing values in a dataset based on a code list. I thought to do this by running a loop combined with a common column merge on lists.
Could be a duplicate of Merge in loop R, or a special case.
#load data
data("mtcars")
#add car names
mtcars <- cbind(cars = rownames(mtcars), mtcars)
rownames(mtcars) <- 1:nrow(mtcars)
#add dates and arrange
date <- rep(seq(as.Date("2015-01-02"), by = "month", length.out = 4),times = 8),
mtcars <- cbind(date = date, mtcars)
mtcars <- mtcars %>%
arrange(., date)
#add additional cars
add_cars <- c("renault", "dacia", "benz", "ferrari",
"AC", "Acura", "Aixam", "Alfa",
"Bertone", "Bestune", "Chevrolet",
"Chrysler", "Haima", "Haval", "Hawtai", "Hennessey")
total_cars <- as_tibble(c(unique(mtcars$cars), add_cars))
colnames(total_cars) <- "cars"
#split data on dates, list total cars
car_dates <- split(mtcars, f= mtcars$date)
total_cars <- as.list(total_cars)
#execute loop
results <- vector(mode = "integer", length = length(car_dates))
mylist <- list()
for (i in 1:length(car_dates)){
g <- nrow(car_dates[[i]])
results[i] <- g
if (results[i] < 144){
res <- list(merge(x = car_dates[[i]], y= total_cars,
by = c("cars"), all = T))
mylist <- c(mylist, res)
mydata_full <- as.data.frame(mylist)
}
}
This loop harvest is a data frame with 48 obs. of 52 variables. Which is partially what I am aiming for. I got the loop to add the missing observations to each date, but it spread the dataset. Now for each date, the initial 13 variables are repeated.
I am stuck here, I only want the initial 13 variables, not long data.
mydata_full <- as_tibble(mydata_full)
head(mydata_full)
# A tibble: 6 x 52
cars date mpg cyl disp hp drat wt qsec vs am gear carb cars.1 date.1 mpg.1 cyl.1
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <date> <dbl> <dbl>
1 AC NA NA NA NA NA NA NA NA NA NA NA NA AC NA NA NA
2 Acura NA NA NA NA NA NA NA NA NA NA NA NA Acura NA NA NA
3 Aixam NA NA NA NA NA NA NA NA NA NA NA NA Aixam NA NA NA
4 Alfa NA NA NA NA NA NA NA NA NA NA NA NA Alfa NA NA NA
5 AMC Jav~ NA NA NA NA NA NA NA NA NA NA NA NA AMC Ja~ NA NA NA
6 benz NA NA NA NA NA NA NA NA NA NA NA NA benz NA NA NA
# ... with 35 more variables: disp.1 <dbl>, hp.1 <dbl>, drat.1 <dbl>, wt.1 <dbl>, qsec.1 <dbl>, vs.1 <dbl>,
# am.1 <dbl>, gear.1 <dbl>, carb.1 <dbl>, cars.2 <chr>, date.2 <date>, mpg.2 <dbl>, cyl.2 <dbl>, disp.2 <dbl>,
# hp.2 <dbl>, drat.2 <dbl>, wt.2 <dbl>, qsec.2 <dbl>, vs.2 <dbl>, am.2 <dbl>, gear.2 <dbl>, carb.2 <dbl>,
# cars.3 <chr>, date.3 <date>, mpg.3 <dbl>, cyl.3 <dbl>, disp.3 <dbl>, hp.3 <dbl>, drat.3 <dbl>, wt.3 <dbl>,
# qsec.3 <dbl>, vs.3 <dbl>, am.3 <dbl>, gear.3 <dbl>, carb.3 <dbl>
I am sure this could be done with a more simple full_join, I tried but succeeded only to full_join separately on each date, what am I missing?
#after rearranging the classes to tibble
mtcars_short <- mtcars %>%
filter(date == "2015-02-02") %>%
full_join(total_cars, by= c("cars"))
> print(mtcars_short)
# A tibble: 48 x 13
date cars mpg cyl disp hp drat wt qsec vs am gear carb
<date> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2015-02-02 Mazda RX4 Wag 21 6 160 110 3.9 2.88 17.0 0 1 4 4
2 2015-02-02 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
3 2015-02-02 Merc 280 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
4 2015-02-02 Merc 450SLC 15.2 8 276. 180 3.07 3.78 18 0 0 3 3
5 2015-02-02 Fiat 128 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1
6 2015-02-02 Dodge Challenger 15.5 8 318 150 2.76 3.52 16.9 0 0 3 2
7 2015-02-02 Fiat X1-9 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1
8 2015-02-02 Ferrari Dino 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6
9 NA Mazda RX4 NA NA NA NA NA NA NA NA NA NA NA
10 NA Hornet Sportabout NA NA NA NA NA NA NA NA NA NA NA
I would like to have a df of 192 obs. and 13 variables. Meaning for each unique date (4) I want all the observations (48).
# A tibble: 48 x 52
cars date mpg cyl disp hp drat wt qsec vs am gear carb
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AC 2015-01-02 21 6 160 110 3.9 2.62 16.5 0 1 4 4
2 Acura 2015-01-02 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
3 Aixam 2015-01-02 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
4 Alfa 2015-01-02 17.3 8 276. 180 3.07 3.73 17.6 0 0 3 3
5 AMC Ja~ 2015-01-02 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
6 benz . . . . . . . . . . . . . .
7 Bertone . . . . . . . . . . . . . .
8 Bestune . . . . . . . . . . . . . .
9 Cadill~ . . . . . and so on . . . . .
10 Camaro~ . . . . . . . . . . . . . .
. date2
. .
. date3
. etc.
.
192
Any input would be appreciated!