I have a datatable with missing data and two columns supposed to help to replace those missing data which looks like :
library(data.table)
Data = data.table(
"H1" = c(NaN,4,NaN),
"H2" = c(5,NaN,NaN),
"H3" = c(7,NaN,NaN),
"Group" = c(1,2,1),
"Factor" = c(2,3,4)
)
H1 H2 H3 Group Factor
1: NaN 5 7 1 2
2: 4 NaN NaN 2 3
3: NaN NaN NaN 1 4
I would like to use a second dataframe for that
Groups = data.table(
"H1" = c(1,2,3),
"H2" = c(4,5,6),
"H3" = c(7,8,9),
"Group" = c(1,2,3)
)
H1 H2 H3 Group
1: 1 4 7 1
2: 2 5 8 2
3: 3 6 9 3
The column "Group" in the dataframe Groups may or may not be useful as it's basically the row number here.
I think about something writing something close to
Data%>%
mutate_at(vars(matches("^H\\d+$")), ~ifelse(is.na(.),
Groups[Group, Hour]*Factor, .))
But obviously, "Hour" is undefined but hopefully could be written like :
as.numeric(substr(columnName, 2, nchar(columnName)))
Expected result :
H1 H2 H3 Group Factor
1: 2 5 7 1 2
2: 4 15 24 2 3
3: 4 16 28 1 4
How could I get that columnName ?
Additional issue : When I replace Hour by 2 in this command, for testing purpose. The whole column Group is considered and not only the Group value of the current line, and I don't understand why.
Any solution that would not consist into repeating mutate commands for each one of my columns, but gets the job done is much appreciated !
This Question may be related to my issue, but I fail to use that "deparse(substitute(.))" command.