Basically I have 2 tables with the same column names and want to do calculations across tables. Ideally, I would have taken data from the two tables and created a third, but I could only find a way to do that if the data tables are the same dimensions because it would be by cell position. Instead, I'd like to do it by column name after having done a join so that I know that the calculations are taking from the correct values.
I am trying to loop through column names to do calculations between various associated columns in the same data table (I have 2 lists of column names, that I am using to call columns from a table where I've joined the two tables. I've adjusted the column name list to add the "_A" and "_B" which were added during the join as the columns had the same names). I'm trying to call the column names using [[i]] (in this case I am using [[1]] to test it).
Does anyone know why I can't call the column name in the name$colname format? If I replace the variable with the name, it works, and if I take just the variable (colnameslistInf[[1]]) it shows the right column name, but once I put it together it says "Unknown or uninitialised column".
> joininfsup$colnameslistInf[[1]]
NULL
Warning message:
Unknown or uninitialised column: `colnameslistInf`.
> colnameslistInf[[1]]
[1] "newName.x"
> joininfsup$newName.x
[1] 5 5 5 5 5 5 5 5 5 5 5
[12] 5 5 5 5 5 5 5 5 5 5 5
[23] 5 5 5 5 5 5 5 5 5 5 5
[34] 5 5 5 5 5 5 5 5 5 5 5
[45] 5 5 5 5 5 5 5 5 5 5 5
I am also getting this error:
Error in `[[<-.data.frame`(`*tmp*`, col, value = integer(0)) :
replacement has 0 rows, data has 264
The code I am trying to run is here. joininfsup is the joined table, and I use mutate to create new columns with the calculations across each of the 200+ columns and its associated column.
joined_day_inf_numeric <-select_if(joined_day_inf, is.numeric) joined_day_sup_numeric<-select_if(joined_day_sup, is.numeric) joininfsup<- left_join(joined_day_inf_numeric, joined_day_sup_numeric, "JOININF", suffix = c("_A", "_B"))
#take colnames from original tables and add _A and _B as those are added during the join
colnameslistInf <- paste0(colnames(joined_day_inf_numeric), "_A")
colnameslistSup <- paste0(colnames(joined_day_sup_numeric), "_B")
for (i in 1:length(colnameslistInf)) { #245 cols, for example
name <- paste0(colnames(joined_day_inf_numeric)[[i]]) #names of new columns as loops through
joininfsup2 <-joininfsup %>%
mutate(!!name := ((joininfsup[[ colnameslistInf[[i]] ]])-joininfsup[[colnameslistSup[[i]] ]]))*joininfsup$proportion_A+joininfsup[[ colnameslistInf[[i]] ]]
write_csv(joininfsup2, paste0("test/finalcalc.csv"))
}
I think this might be the key but am having trouble applying it: Use dynamic name for new column/variable in `dplyr`
UPDATE: I replaced name
in the mutate function with !!name :=
and the code ran! But gave me the same output as the original joined table because I'm still getting the "Unknown or uninitialised column: colnameslistInf
." warning.
UPDATE2: added missing join code, needed to save variable in for loop, added [[]] acording to @Parfait 's suggestion-- but the code still does not work (does not add any new columns).
UPDATE3:
I tried @Parfait's common_columns method but got an error:
Error: Can't subset columns that don't exist. x Columns 8_, 50_, 51_, 55_, 78_, etc. don't exist.
These columns were removed at the is.numeric step so not sure why it is pulling from the original dataset. Also, using match
deletes a bunch of other columns that have characters as names