0

I need to know how to remove all characters from a value after the first D letter and 1st number or 2 second number. I am not sure how to start.

I have a data frame and I have a column of type Character

  • The column is called " Eircode "

The postal codes go from D01 to D24 ( these are Dublin postal codes )

The values are inputted like so What you see in red is what needs to be removed.

I need to be able to remove the characters after the last digit.

My dataframe is called "MainSchools"

So if the " Eircode " is D03P820, I need to have it as D03 after my change.

I would preferably like to be able to do this with the Tidyverse package if possible.

mrwahl
  • 79
  • 1
  • 9
  • 2
    Welcome to Stack Overflow! Can you please read and incorporate elements from [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Especially the aspects of using `dput()` for the input. – wibeasley Nov 17 '21 at 15:09

2 Answers2

2

You may use sub here:

df <- data.frame(Eircode=c("D15P820", "K78YD27", "D03P820"),
                 stringsAsFactors=FALSE)
df$Eircode <- sub("^(D(?:0[1-9]|1[0-9]|2[0-4])).*$", "\\1", df$Eircode)
df

  Eircode
1     D15
2 K78YD27
3     D03

The regex pattern used above matches and captures Dublin postal codes as follows:

D           match D
(?:
    0[1-9]  followed by 0-9
    |       OR
    1[0-9]  10-19
    |       OR
    2[0-4]  20-24
)

Then, we use \1 as the replacement in sub, leaving behind only the 3 character Dublin postal code.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • I tried the above code and it worked exactly like expected . Thank you . I also have values like `D6WHP92` which do not have the D06 format . How can I tailor the above to suit this requirement also? – mrwahl Nov 17 '21 at 15:32
  • Maybe use: `^(D(?:[0-9]|0[1-9]|1[0-9]|2[0-4])).*$` ... assuming that you _do_ want to target the value `D6WHP92` – Tim Biegeleisen Nov 17 '21 at 15:39
  • For values such as `A94FC44` or `K32VK33` essentially anything that is not D( plus the number ) I know I can sub them manually but what quick function can I use to simply delete any values that are not in my D01 / D1 format ? – mrwahl Nov 17 '21 at 15:51
  • Use the above regex pattern with `grepl`. You should open a new question at this point. – Tim Biegeleisen Nov 17 '21 at 15:55
  • `MainSchools$Eircode <- sub("^(K).*$", "", MainSchools$Eircode)` Just done this as I plan on subbing empty fields with something else. Thanks for the help above. – mrwahl Nov 17 '21 at 16:27
  • To clarify the format, eircodes are always a letter, two digits, a space, and then four alphanumeric characters. The first three characters are the routing key. Routing keys beginning with `D` encode the historic Dublin postal districts. There is a single exception from the pattern with the routing key `D6W`, which exists for historical purposes (it's the old Dublin 6 West postal district). The space should be omitted in stored values, but inserted for display. – TRiG May 26 '23 at 22:36
0

I like to use the stringr package for such operations.

library(dplyr)
library(sitrngr)

df %>% mutate(Eircode = str_extract_all(Eircode, '^[A-Z][0-9]{2}'))

output with the data from @Tim Biegeleisen:

  Eircode
1     D15
2     K78
3     D03
GuedesBF
  • 8,409
  • 5
  • 19
  • 37
  • This is also a nice solution that works! Thank you – mrwahl Nov 17 '21 at 16:18
  • I am glad it works. Please consider upvoting whenever you gain upvoting privileges – GuedesBF Nov 17 '21 at 23:58
  • An eircode is always a letter, two digits, (space), then four alphanumeric characters *except for* those beginning `D6W` @mrwahl. You'd need to adapt this code to allow for the `D6W` special case. (These exist for historical reasons.) – TRiG Apr 21 '23 at 16:39