I am working in the R programming language and would appreciate some help with formulating regular expressions.
I have a setup where I am accepting a list of numbers from the user as a string and I want to extract all the numbers from the string into a numeric vector. I have informed the user to provide the numbers to me as being separated by commas. But I can't expect the user to respect that. Thus I want to extract the numbers even if they are separating by spaces or semicolons or something weird.
I want to be able to extract all real numbers from the string even if the numbers are negative (ex. -5) or contain a decimal (ex. 5.5) or are in scientific notation (ex 5.5e-5, 5.5E-5, 5.5e+5, 5.5E+5, 5.5e5, 5.5E5)
I was reading a forum on a similar question and identified regex that could extract numbers from a string, but I realized that it doesn't work for negative numbers or decimals or scientific notation. I would like to able to handle all.
Using this regular expression I am able to extract real whole numbers from a string separated by spaces or commas or even semi-colons.
# Using this string works
this_string = "1, 2 3, 5, 7, 10, 11, 12; 18"
extracted_numbers = as.numeric(regmatches(this_string, gregexpr("[0-9]+", this_string))[[1]])
print(extracted_numbers)
Extracted Result: [1] 1 2 3 5 7 10 11 12 18
But the same regular expression does not work on this more complex string with negative numbers, scientific notation, and decimals.
this_string = "-1, 0, 5e-1 ; 7E-1, 2 3.0, 4, 5.33e+2"
Extracted Result: [1] 1 0 5 1 7 1 2 3 0 4 5 33 2
A correct extraction of numbers from the string should yield:
Desired Extracted Result: [1] -1.0 0.0 0.5 0.7 2.0 3.0 4.0 533.0
Thanks so much for your help.
Edit: I just found a viable solution:
this_string = "-1, 0, 5e-1 ; 7E-1, 2 3.0, 4, 5.33e+2"
extracted_numbers = as.numeric(regmatches(this_string, gregexpr("[-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?", this_string))[[1]])
print(extracted_numbers)
user Wojciech Sobala provided an answer with the above regular expression in this question: Extracting decimal numbers from a string
Thanks Wojciech.