4

I am trying to build a regex expression to extract a 6 digit number (positive or negative) after a certain string, namely 'LogL='.

It comes from text output from certain software.

   7 LogL=-3695.47     S2=  9.0808       1891 df    2.263     0.2565    
   9 LogL= 2456.30     S2=  1.2789       1785 df    1.244     0.1354    

I tried the following in R:

txt <- "   9 LogL= 2456.30     S2=  1.2789       1785 df    1.244     0.1354   "
as.numeric(unlist(strsplit(sub(".*LogL=*", "", txt), " "))[1])

Doesn't work for positive numbers. And I imagine its very crude/ugly way of going about it. I tried meddling on regex101.com

Stackoverflow related questions tried: (1) (2) (3)

I am kind of lost and can't seem to understand regex expressions. I am sure this is a piece of cake. Help?

Community
  • 1
  • 1
tstev
  • 607
  • 1
  • 10
  • 20

4 Answers4

7

I'd use a look-behind regex:

txt <- "   7 LogL=-3695.47     S2=  9.0808       1891 df    2.263     0.2565    
           9 LogL= 2456.30     S2=  1.2789       1785 df    1.244     0.1354   "
pattern <- "(?<=LogL\\=)\\s*\\-*[0-9.]+"
m <- gregexpr(pattern, txt, perl = TRUE)
as.numeric(unlist(regmatches(txt, m)))
#1] -3695.47  2456.30
Roland
  • 127,288
  • 10
  • 191
  • 288
  • Appreciate the link and the answer! Need to start learning regex expressions. They are very useful it seems. – tstev Jun 30 '16 at 11:14
6

Try

LogL=\s*(-?\d+(?:\.\d+)?)

It matches your text (LogL), an equal sign followed by any number of spaces. Then it captures:

  • an optional -
  • digits, at least one
  • and optionally, a . followed by at least one digit.

Check it here at regex101.

SamWhan
  • 8,296
  • 1
  • 18
  • 45
3

If you can be interested in a non regex alternative:

library(stringr)
txt <- "   9 LogL= 2456.30     S2=  1.2789       1785 df    1.244     0.1354   "
word(txt, 2, sep = "=") %>% word(2, sep = " ")

It works with positive and negative numbers.

thepule
  • 1,721
  • 1
  • 12
  • 22
3

We can use str_extract

 library(stringr)
 as.numeric(str_extract_all(txt, "(?<=LogL=\\s{0,1})[-0-9.]+")[[1]])
 #[1] -3695.47  2456.30

Or we can use a combination of strsplit and gsub

as.numeric(gsub(".*LogL=\\s*|\\s+.*", "", trimws(strsplit(txt, "\n")[[1]])))
#[1] -3695.47  2456.30
akrun
  • 874,273
  • 37
  • 540
  • 662