0

How do I remove the middle part of a sting? For example, take the string - '2018_002.Feb'. For this example, I want to remove '002.', so that I get '2018_Feb'

Can anyone help me? Thanks!

Vladimir Samsonov
  • 1,344
  • 2
  • 11
  • 18
Aneesh
  • 63
  • 2
  • 12
  • Try with `sub("\\d+\\.", "", str1)#[1] "2018_Feb"` or by convertting to date class `format(as.Date(str1, "%Y_0%d.%b"), "%Y_%b")` – akrun Feb 16 '18 at 11:23
  • related: [regex](https://stackoverflow.com/questions/4736/learning-regular-expressions) – jogo Feb 16 '18 at 11:26
  • Both the commands worked perfectly well. Thanks. Could you please just explain me how these commands worked? – Aneesh Feb 16 '18 at 11:36

1 Answers1

5

I like to use the stringr package as opposed to the base r packages for string manipulations because I find the syntax for the functions more consistent.

library(stringr)

var = "2018_002.Feb"

str_replace(var, pattern = "_\\d+\\.", replacement = "_")

# [1] "2018_Feb"

With the str_replace() you are basically searching a pattern in the string an replacing it with something else. Often the replacement will just be an empty "", but in this case, it is easier to start the search where the function finds a _ character because it is rather unique. From there you want to match on all the numbers that come after up to the period.

I recommend learning a bit about regular expression. The Basic Regular Expressions in R Cheat Sheet is a good resource.

The regex for this problem reads something like this:

  • first find _ character that is followed by a number \\d and keep matching numbers + until you reach a period \\.
  • Once you find this match "_002.", replace it with "_"

Hope that was comprehensible!