Questions tagged [gsubfn]

An R library enabling regex-based string replacement that allows a function as a replacement argument. The whole match and its capturing groups can be passed to that function and the match can be further manipulated inside the function (like incrementing matched numbers, replace certain patterns only inside another pattern, etc.)

gsubfn is like gsub but can take a replacement function or certain other objects instead of the replacement string. Matches and back references are input to the replacement function and replaced by the function output. gsubfn can be used to split strings based on content rather than delimiters and for quasi-perl-style string interpolation. The package also has facilities for translating formulas to functions and allowing such formulas in function calls instead of functions. This can be used with R functions such as apply, sapply, lapply, optim, integrate, xyplot, Filter and any other function that expects another function as an input argument or functions like cat or sql calls that may involve strings where substitution is desirable.

16 questions
15
votes
4 answers

Replace the spaces between multiple (3+) capital letters

I have some text where people use capitals with spaces in between to make the substring standout. I want to replace the spaces between these substrings. The rules for the pattern is: "at least 3 consecutive capital letters with a space between…
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
8
votes
2 answers

Why isn't \\b in gsubfn in R working for me?

I have a string like this: vect <- c("Thin lines are not great, I am in !!! AND You shouldn't be late OR you loose") I want to replace, "in" to %in%", "AND" to "&", "OR" to "|". I know this can be done using gsub like below: gsub("\\bin\\b","%in%",…
PKumar
  • 10,971
  • 6
  • 37
  • 52
3
votes
1 answer

Gsub to gsubfn, how does it transfer?

I am cleaning my dataset and removing all accents on letters and such. In order to do this I use gsub (see code below). It works perfectly fine but I am sure there is a more convenient way to do it. I've heard about gsubfn but I have not been able…
David Potrel
  • 111
  • 8
2
votes
1 answer

Using gsubfn to replace many instances within a string

I wrote a function that transforms a string representing a number (magrittr loaded in my system): adjust_perc_format <- function(x, n=3){ gsub(",", ".", x, perl = T) %>% as.numeric() %>% format(nsmall=n, decimal.mark = ",") } So…
Fabio Correa
  • 1,257
  • 1
  • 11
  • 17
2
votes
3 answers

why does gsubfn omit part of the match?

I analyse text strings and I try replace all dots . within round brackets () with commas , I found a regex that matches eveything within the brackets: text <- "let's count (get . this . without dots) the days?" brackets =…
captcoma
  • 1,768
  • 13
  • 29
2
votes
2 answers

gsubfn on data frame

Search-and-replace an element in a data frame given a list of replacements. Code: testing123tmp <- data.frame(x=c("it's", "not", "working")) testing123tmp$x <- as.character(testing123tmp$x) tmp <- list("it's" = "hey",…
beavis11111
  • 576
  • 1
  • 7
  • 19
2
votes
1 answer

Native regex way to replace multiple leading chars with equal number spaces

I have some strings that are spaced as I want but that have leading digits that I don't want. I want to replace each of these leading digits with an equal number of spaces so as to maintain the spacing. I can do this with the gsubfn package but am…
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
1
vote
1 answer

How to get str_sub to accept output from str_locate_all when there are multiple replacements in a string and also assign replacements, vectorized

There are a lot of string replacement questions, but I could not find one that addressed this issue specifically. I have a too long and slow if else for loop to solve this problem, but according to the str_sub documentation, the matrix output of…
Pearl
  • 123
  • 6
1
vote
1 answer

How to replace string patterns with some numbers using gsubfn

I have a dataset df1. I'd like to replace each occurence of "One + one," "Two ; one," etc. with some numbers as shown in the lookup table df2. Desired output: Any idea how to do this? This is a follow-up to my original question How to replace…
Ketty
  • 811
  • 10
  • 21
1
vote
1 answer

Which regular expression is more appropriate?

I am trying to make models output prettier with pre-defined labels for my variables. I have a vector of variable names (a), a vector of labels (b) and model terms (c). I have to match the vectors (a) and (c) and replace (a) by (b). I found this…
1
vote
3 answers

gsubfn | Replace text using variables in Substitution

I am trying to remove a block of text that wraps around what I want to keep. So I wanted to assign variables since the text can be long. This is an example of what I am trying to do. [Doesn't remove the text] Text<-'This is an example text [] test'…
Koolakuf_DR
  • 467
  • 4
  • 16
0
votes
2 answers

Remove multiple instances with a regex expression, but not the text in between instances

In long passages using bookdown, I have inserted numerous images. Having combined the passages into a single character string (in a data frame) I want to remove the markdown text associated with inserting images, but not any text in between those…
lawyeR
  • 7,488
  • 5
  • 33
  • 63
0
votes
1 answer

gsubfn() for variants of pattern string does not give expected output

I'm trying to match a partial pattern of the variable names in my data set and replace them all with another pattern using gsubfn(). I'm using R version 4.0.3 (2020-10-10). The below code shows the sample pattern of variable names in the data set…
Usha Kota
  • 43
  • 9
0
votes
1 answer

Extracting dates in R, from a string variable with different date formats exhibiting lack of general structure / difficult pattern

I have a column of roughly 1300 characters which I need to extract a single date from, if the character contains a date (i.e. if NA then no date to be taken) and if it contains multiple dates, I only need one; if it contains an interval of dates, I…
yungmist
  • 13
  • 5
0
votes
1 answer

Extracting dates following a specific word from a column of strings using dplyr

I am trying to extract the most recent date that a report was added in an R dataframe of reports. The text always looks like Date Ordered: M/DD/YYYY and may contain 0 many times in a given report. If it's repeating, I want the most recent…
sm002
  • 101
  • 1
  • 10
1
2