0

I want to modify data frame columns to have coordinates in the same units. They are found in these units; dec_deg, deg_dec_min, or NA. Here is a reproducible example:

Long <- c("","E 9.64740","E 9°35.988'","","-16.5708666666667","109.395389",
"-16.6455666666667","W047 22.044", "-16.5437166666667")

Lat <- c("","S 2.40889","N 00°27.799","14.0011","","-0.632361",
         "13.9622333333333","S00 37.952", "14.0532")

Date <- as.Date(c('2010-11-1','2008-3-25','2007-3-14', '2010-11-1','2008-3-25','2007-3-14','2010-11-1','2008-3-25','2007-3-14'))
Site.ID <- c("MWA-S", "MWA-S","MWA-S","BAM","BAM","BAM","BAM","BAM","BAO")
No.ID <- c(34, 5,16,46,2,85,60,1,30)

DF <- data.frame(No.ID, Site.ID, Date, Lat, Long)

I referenced this question to clean up my columns by using the measurements library and removing unwanted characters. But that fails b/c the coordinates are not in the same units. I want to create a function like this one that conditionally does the conversion.

library(measurements)

coord2dec <- function(x) {
  x <- as.character(x)
  x <- do.call(rbind, strsplit(x, split='N'|'E'|'S'|'W'|'°'))#maybe where to #apply my conditions
  x <- apply(x, col, function(y) {
    y <- as.numeric(y)
    measurements::conv_unit(y$col, from = 'deg_dec_min', to = 'dec_deg')
  })
  return(x)
}

new_df <- apply(DF2, coord2dec)

The above fails b/c I may be missing other conditions & formatting. My goal is to create a function that identifies if a coordinate is deg_dec_min (dd mm.mmmm) or a dec_deg (dd.ddddd). Then it would convert W/S to -; Remove "NSEW" and whitespace and replaces degree with space. Desired output would convert the example data frame to the following.

  No.ID Site.ID       Date       Lat              Long
1    34   MWA-S 2010-11-01       NA               NA     
2     5   MWA-S 2008-03-25   -2.408890         9.647400
3    16   MWA-S 2007-03-14    0.463317         9.599800
4    46     BAM 2010-11-01   14.0011         -16.5708667
5     2     BAM 2008-03-25       NA               NA        
6    85     BAM 2007-03-14   -0.632361      109.395389
7    60     BAM 2010-11-01   13.96223333    -16.6455666666667
8     1     BAM 2008-03-25   -0.632533      -47.367400
9    30     BAO 2007-03-14   14.0532        -16.5437166666667
pbalt
  • 3
  • 2
  • Not sure I understand what your data looks like. `data.frame': 17663 obs. of 2 variable:` suggests your table currently has two columns. – Jon Spring Mar 10 '21 at 07:37
  • It would be easier to help if you create a small reproducible example along with expected output. Read about how to give a reproducible example: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Jon Spring Mar 10 '21 at 07:41
  • @JonSpring I really appreciate your feedback, I am a beginner contributor to this channel. Will re-edit! – pbalt Mar 10 '21 at 17:37
  • I just re-edited with additional comments. I originally said before that my coordinates are found in 3 units when it is actually two as mentioned in the thread. – pbalt Mar 10 '21 at 22:13
  • 1
    Can you define explicitly what characteristics are reliable signals of the input being in deg_dec_min vs. deg_dec format? And/or perhaps show what you'd like your output to look like if all the correct transformations were performed? – Jon Spring Mar 10 '21 at 22:41
  • Thanks for pointing that out Jon, I edited my post to accommodate that. Unfortunately, my post is painstakingly long. I will revise and condense so that it is not a monstrosity to look at. Many thanks. – pbalt Mar 10 '21 at 23:42

2 Answers2

0

Your proposed solution looks like it would be well suited to a regex solution, but I thought a simpler approach might be to:

  1. turn S and W into negative numbers
  2. remove S/N/E/W
  3. replace ° with a space
  4. split at any spaces and assume everything on the right is in minutes
  5. combine sign, degree, and minute/60.

I use pivot_longer so that I can put lat and long values into one column and apply these same transformations to both at the same time, then use pivot_wider to put them back.

library(tidyverse)
DF %>%
  pivot_longer(Lat:Long) %>%
  mutate(sign = if_else(str_detect(value, "S|W"), -1, 1)) %>%
  mutate(value = value %>% 
           str_replace_all(c("S|N|W|E" = "", "°" = " ")) %>%
           str_trim()) %>%
  separate(value, c("deg", "min"), sep = " ", fill = "right") %>%
  mutate(deg2 = parse_number(deg),
         min2 = coalesce(parse_number(min)/60, 0),
         result = sign * (deg2 + min2)) %>%
  select(-c(deg:min2)) %>%
  pivot_wider(names_from = name, values_from = result)

I can't guarantee this'll work for all your data, but it looks like it does for the example data here. (Looks like there was typo swapped value / space in your DF input that varies from the suggested output in rows 4+5.)

# A tibble: 9 x 5
  No.ID Site.ID Date          Lat   Long
  <dbl> <chr>   <date>      <dbl>  <dbl>
1    34 MWA-S   2010-11-01 NA      NA   
2     5 MWA-S   2008-03-25 -2.41    9.65
3    16 MWA-S   2007-03-14  0.463   9.60
4    46 BAM     2010-11-01 14.0    NA   
5     2 BAM     2008-03-25 NA     -16.6 
6    85 BAM     2007-03-14 -0.632 109.  
7    60 BAM     2010-11-01 14.0   -16.6 
8     1 BAM     2008-03-25 -0.633 -47.4 
9    30 BAO     2007-03-14 14.1   -16.5 
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • 1
    After a quick review, I believe this works well thanks so much! I do not have enough of a reputation to upvote, but I accepted it as a good answer. Many thanks. Will re-edit my post. – pbalt Mar 11 '21 at 16:11
0

First of all, I want to say thank you to the stack overflow community and to Jon. After working on this, I also received a solution from a colleague, not sure if he is on here will update to give credit if needed. Jon Spring's way worked beautifully and so did this.

  1. function 1: clean up the coordinates by removing SW characters and replacing them with -; remove all other characters & whitespace; replace deg. symbol with space
  2. function 2: identify the parts of the coordinate that are either deg_dec_min or dec_deg. Account for the range of possibilities for numbers. Then convert using conv_unit function.
  3. apply the functions in a dplyr pipeline
# gets rid of characters, leaving formatted dd.ddddd or dd mm.mmm
clean_coords <- function(x) {
  v <- gsub("[Ww]|[Ss]|[Ww] |[Ss] ", "-", x)  # convert W/S to -
  v2 <- gsub("[eEwWsSnN] ", "", v)  # remove NSEW and whitespace
  v3 <- gsub("°|'", " ", v2)  # replace degree with space
  return(v3)
}
# finds elements that are in dd mm.mmmm format and converts them to dd.ddddd
ddmmm_to_dd <- function(x) {
  ind <- grep("[0-9]{1,3} [0-9]", x)
  x[ind] <- conv_unit(x[ind], from = 'deg_dec_min', to = 'dec_deg')
  return(x)
}
# apply functions in pipeline
DF2 <- DF %>% 
  mutate(across(c(Lat, Long), clean_coords)) %>% 
  mutate(across(c(Lat, Long), ddmmm_to_dd))
pbalt
  • 3
  • 2