A data.frame which consists of multiple columns of the same data type is an indication that the data might be reshaped from wide to long format. Therefore, melt()
is used to get rid of the NA
s and join to add the new column to the original data.frame:
library(data.table)
DT[melt(DT[, rn := .I], id.vars = "rn", na.rm = TRUE)[
order(variable), .(New = last(value)), by = rn], on = .(rn)][, rn := NULL][]
Type1 Type2 Type3 Expected_Output New
1: Red Orange Pink Pink Pink
2: Green abc NA abc abc
3: Blue NA NA Blue Blue
4: white NA Green Green Green
5: NA Brown purple purple grey
6: NA black NA black purple
7: grey NA NA grey black
Data
fread()
is used to read the sample data set. The na.strings
parameter tells fread()
to convert the "na"
strings to NA
:
library(data.table)
DT <- fread(
"Type1 Type2 Type3 Expected_Output
Red Orange Pink Pink
Green abc na abc
Blue na na Blue
white na Green Green
na Brown purple purple
na black na black
grey na na grey ",
na.strings = "na")
Edit
The OP has requested that also lines which are fully NA
should appear in the output. This can be achieved by changing the order of data.table
objects in the right join. In data.table
syntax, X[Y]
is a right join which takes all rows of Y
. If all rows of X
are required, the right join Y[X]
has to be used
library(data.table)
# new data with 8th row
DT <- fread(
"Type1 Type2 Type3 Expected_Output
Red Orange Pink Pink
Green abc na abc
Blue na na Blue
white na Green Green
na Brown purple purple
na black na black
grey na na grey
na na na na",
na.strings = "na")
melt(DT[, rn := .I], id.vars = "rn", na.rm = TRUE)[
order(variable), .(New = last(value)), by = rn][DT, on = .(rn)][, rn := NULL][]
New Type1 Type2 Type3 Expected_Output
1: Pink Red Orange Pink Pink
2: abc Green abc NA abc
3: Blue Blue NA NA Blue
4: Green white NA Green Green
5: purple NA Brown purple purple
6: black NA black NA black
7: grey grey NA NA grey
8: NA NA NA NA NA