How to remove data after certain characters

Question

I need to know how to remove all characters from a value after the first D letter and 1st number or 2 second number. I am not sure how to start.

I have a data frame and I have a column of type Character

The column is called " Eircode "

The postal codes go from D01 to D24 ( these are Dublin postal codes )

The values are inputted like so What you see in red is what needs to be removed.

I need to be able to remove the characters after the last digit.

My dataframe is called "MainSchools"

So if the " Eircode " is D03P820, I need to have it as D03 after my change.

I would preferably like to be able to do this with the Tidyverse package if possible.

Welcome to Stack Overflow! Can you please read and incorporate elements from [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Especially the aspects of using `dput()` for the input. — wibeasley, Nov 17 '21 at 15:09

score 2 · Accepted Answer · answered Nov 17 '21 at 15:10

2

You may use sub here:

df <- data.frame(Eircode=c("D15P820", "K78YD27", "D03P820"),
                 stringsAsFactors=FALSE)
df$Eircode <- sub("^(D(?:0[1-9]|1[0-9]|2[0-4])).*$", "\\1", df$Eircode)
df

  Eircode
1     D15
2 K78YD27
3     D03

The regex pattern used above matches and captures Dublin postal codes as follows:

D           match D
(?:
    0[1-9]  followed by 0-9
    |       OR
    1[0-9]  10-19
    |       OR
    2[0-4]  20-24
)

Then, we use \1 as the replacement in sub, leaving behind only the 3 character Dublin postal code.

answered Nov 17 '21 at 15:10

Tim Biegeleisen

502,043
27
286
360

I tried the above code and it worked exactly like expected . Thank you . I also have values like `D6WHP92` which do not have the D06 format . How can I tailor the above to suit this requirement also? – mrwahl Nov 17 '21 at 15:32
Maybe use: `^(D(?:[0-9]|0[1-9]|1[0-9]|2[0-4])).*$` ... assuming that you _do_ want to target the value `D6WHP92` – Tim Biegeleisen Nov 17 '21 at 15:39
For values such as `A94FC44` or `K32VK33` essentially anything that is not D( plus the number ) I know I can sub them manually but what quick function can I use to simply delete any values that are not in my D01 / D1 format ? – mrwahl Nov 17 '21 at 15:51
Use the above regex pattern with `grepl`. You should open a new question at this point. – Tim Biegeleisen Nov 17 '21 at 15:55
`MainSchools$Eircode <- sub("^(K).*$", "", MainSchools$Eircode)` Just done this as I plan on subbing empty fields with something else. Thanks for the help above. – mrwahl Nov 17 '21 at 16:27
To clarify the format, eircodes are always a letter, two digits, a space, and then four alphanumeric characters. The first three characters are the routing key. Routing keys beginning with `D` encode the historic Dublin postal districts. There is a single exception from the pattern with the routing key `D6W`, which exists for historical purposes (it's the old Dublin 6 West postal district). The space should be omitted in stored values, but inserted for display. – TRiG May 26 '23 at 22:36

score 0 · Answer 2 · answered Nov 17 '21 at 15:57

0

I like to use the stringr package for such operations.

library(dplyr)
library(sitrngr)

df %>% mutate(Eircode = str_extract_all(Eircode, '^[A-Z][0-9]{2}'))

output with the data from @Tim Biegeleisen:

  Eircode
1     D15
2     K78
3     D03

answered Nov 17 '21 at 15:57

GuedesBF

8,409
5
19
37

This is also a nice solution that works! Thank you – mrwahl Nov 17 '21 at 16:18
I am glad it works. Please consider upvoting whenever you gain upvoting privileges – GuedesBF Nov 17 '21 at 23:58
An eircode is always a letter, two digits, (space), then four alphanumeric characters *except for* those beginning `D6W` @mrwahl. You'd need to adapt this code to allow for the `D6W` special case. (These exist for historical reasons.) – TRiG Apr 21 '23 at 16:39

How to remove data after certain characters

2 Answers2