I have a dataframe and each row has up to 4 different pairs of geographic coordinates. The fields that hold them are LAT1 & LONG1, LAT2 & LONG2, LAT3 & LONG3 and LAT4 & LONG4. These vary in completeness - and often some values are NA. I want to create columns "LAT" and "LONG", and populate these columns with the first pair of coordinates in which neither is NA. The order I check will be 1, 2, 3, 4. For example, my data might look like this:
ORD LAT1 LONG1 LAT2 LONG2 LAT3 LONG3 LAT4 LONG4
1 NA NA NA NA 44.65783 -65.86633 44.68800 -65.80183
2 NA NA NA NA 44.67033 -65.81833 44.67333 -65.84767
3 NA NA 44.19133 -63.32017 44.25583 -63.27117 NA NA
4 NA NA 44.25717 NA 44.30950 -63.18617 NA NA
5 NA NA 44.30667 -63.17550 44.35483 -63.08717 NA NA
6 NA NA 44.35433 -63.07267 44.30633 -63.12117 NA NA
and I want to end up with this:
ORD LAT1 LONG1 LAT2 LONG2 LAT3 LONG3 LAT4 LONG4 LAT LONG
1 NA NA NA NA 44.65783 -65.86633 44.68800 -65.80183 44.65783 -65.86633
2 NA NA NA NA 44.67033 -65.81833 44.67333 -65.84767 44.67033 -65.81833
3 NA NA 44.19133 -63.32017 44.25583 -63.27117 NA NA 44.19133 -63.32017
4 NA NA 44.25717 NA 44.30950 -63.18617 NA NA 44.30950 -63.18617
5 NA NA 44.30667 -63.17550 44.35483 -63.08717 NA NA 44.30667 -63.17550
6 NA NA 44.35433 -63.07267 44.30633 -63.12117 NA NA 44.35433 -63.07267
I have actually already achieved this through the following 2 ifelse
blocks, and they are doing exactly what I want:
set_df$LAT <- ifelse(is.na(set_df$LAT1) | is.na(set_df$LONG1),
ifelse(is.na(set_df$LAT2) | is.na(set_df$LONG2),
ifelse(is.na(set_df$LAT3) | is.na(set_df$LONG3), set_df$LAT4,set_df$LAT3),
set_df$LAT2),
set_df$LAT1)
set_df$LONG <- ifelse(is.na(set_df$LAT1) | is.na(set_df$LONG1),
ifelse(is.na(set_df$LAT2) | is.na(set_df$LONG2),
ifelse(is.na(set_df$LAT3) | is.na(set_df$LONG3), set_df$LONG4,set_df$LONG3),
set_df$LONG2),
set_df$LONG1)
So the reason I'm here is that this strikes me as an inefficient way to get my results since I'm essentially identifying the pair of coordinates that I want twice - the first block is capturing only the LAT and the second block grabs the LONG.
Is there a way I can grab both values and pop them into the new columns at once? I would vastly prefer only base R solutions.
edit: Here it is using @user2974951's method, but with named fields and desired output:
tmp=apply(set_df,1,function(x){
if (!is.na(x["LAT1"]) & !is.na(x["LONG1"])) {return(c(x[test1="LAT1"],x["LONG1"]))}
else if (!is.na(x["LAT2"]) & !is.na(x["LONG2"])) {return(c(x["LAT2"],x["LONG2"]))}
else if (!is.na(x["LAT3"]) & !is.na(x["LONG3"])) {return(c(x["LAT3"],x["LONG3"]))}
else {return(c(x["LAT4"],x["LONG4"]))}
})
set_df = cbind(set_df,t(tmp))
colnames(set_df)[colnames(set_df)=="1"] <- "LAT"
colnames(set_df)[colnames(set_df)=="2"] <- "LONG"