0

I am currently working with a large data set using R. So, I have a column called "Offers". This column contains text describing 'promotions' that companies offer on their products. I am trying to extract numeric values from these. While, for most cases, I am able to do so well using a combination of regex and functions in R packages, I am unable to deal with a couple of specific cases of text shown below. I would really appreciate any help on these.

  1. "Buying this ensures Savings of $50. Online Credit worth 35$ is also available. So buy soon!"

    1a. I want to get both the numeric values out but in 2 different columns. How do I go about that?

    1b. For another problem that I have to solve, I only need to take the value associated with the credit. It is always the case that for texts like above, the second numeric value in the text, if it exists, is the one associated with the credit.

  2. "Get 50% off on your 3 night stay along with 25 credits, offer available on 3 December 2016"

(How should I only take the value associated with credits?)

Note: Efficiency would be important as well because I am dealing with about 14 million rows.

I have tried looking online for a solution but have not found anything very satisfactory.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • 1
    It would be helpful if you included the expected output from each of your examples. As it stands, I'm not exactly sure what you're trying to do. – DunderChief Jun 18 '15 at 20:14
  • 2
    Care to prepare a reproducible example? – Roman Luštrik Jun 18 '15 at 20:24
  • Sorry if I was not clear. for 1(a) 50 35 both in two different columns. 1b) just want 50 out. For 2) 25 – Kartikeya Kumar Jun 18 '15 at 20:53
  • Have a look at [how to make a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so you understand what it means to provide a reproducible example. – Jota Jun 18 '15 at 21:03

1 Answers1

0

I am not 100% sure about what you want but this may help you.

A <- "do 50% and whatever 23"
B <- gregexpr("\\d+",A)[[1]]
firstNum <- substr(A,B[1],B[1]+attr(B,"match.length")[1]-1)
secondNum <- substr(A,B[2],B[2]+attr(B,"match.length")[2]-1)

Hope this helps.

RDGuida
  • 546
  • 4
  • 15