1

I have a datatable with missing data and two columns supposed to help to replace those missing data which looks like :

library(data.table)
Data = data.table(
  "H1" = c(NaN,4,NaN),
  "H2" = c(5,NaN,NaN),
  "H3" = c(7,NaN,NaN),
  "Group" = c(1,2,1),
  "Factor" = c(2,3,4)
)

    H1  H2  H3 Group Factor
1: NaN   5   7     1      2
2:   4 NaN NaN     2      3
3: NaN NaN NaN     1      4

I would like to use a second dataframe for that

Groups = data.table(
  "H1" = c(1,2,3),
  "H2" = c(4,5,6),
  "H3" = c(7,8,9),
  "Group" = c(1,2,3)
)
   H1 H2 H3 Group
1:  1  4  7     1
2:  2  5  8     2
3:  3  6  9     3

The column "Group" in the dataframe Groups may or may not be useful as it's basically the row number here.

I think about something writing something close to

Data%>%
  mutate_at(vars(matches("^H\\d+$")), ~ifelse(is.na(.),
                                              Groups[Group, Hour]*Factor, .))

But obviously, "Hour" is undefined but hopefully could be written like :

as.numeric(substr(columnName, 2, nchar(columnName)))

Expected result :

   H1 H2 H3 Group Factor
1:  2  5  7     1      2
2:  4 15 24     2      3
3:  4 16 28     1      4

How could I get that columnName ?

Additional issue : When I replace Hour by 2 in this command, for testing purpose. The whole column Group is considered and not only the Group value of the current line, and I don't understand why.

Any solution that would not consist into repeating mutate commands for each one of my columns, but gets the job done is much appreciated !

This Question may be related to my issue, but I fail to use that "deparse(substitute(.))" command.

thothal
  • 16,690
  • 3
  • 36
  • 71
FBiggio
  • 35
  • 1
  • 5

1 Answers1

1

Here is a solution with mutate_at and map2.

library(purrr)
library(dplyr)

# Define columns for use later
cols_x <- paste0("H", 1:3, ".x")
cols_y <- paste0("H", 1:3, ".y")

# Multiply hours by factor
df <- left_join(Data, Groups, by = "Group") %>%
  mutate_at(cols_y, ~ . * Factor) 

# Replace values if missing
df <- as.data.frame(map2(cols_x, cols_y, ~ ifelse(is.na(df[[.x]]), df[[.y]], df[[.x]]))) %>%
  setNames(gsub(".x", "", cols_x))
Bjørn Kallerud
  • 979
  • 8
  • 23