2

How to extract just the number from the following dataframe.

last_run<-c('Last run 15 days ago','1st up after 126 days','Last run 21 days ago',
            'Last run 22 days ago','1st up after 177 days','1st up after 364 days')%>%
  as.data.frame()

The desired output is:

enter image description here

My attempt is:

new_df<-sapply(str_split(last_run$last_run," run"|"after"),'[',2)%>%
  as.data.frame()
Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
GDog
  • 163
  • 9
  • Does [this previous post](https://stackoverflow.com/questions/41116310/return-number-from-string) answer your question? – Jan Apr 25 '21 at 09:03
  • similar to Ronak's answer, look ahead can be applied `as.numeric(str_extract(last_run$., '\\d+(?= days)'))` – AnilGoyal Apr 25 '21 at 09:12

4 Answers4

3
sapply(strsplit(last_run, " "), function(x) na.omit(as.numeric(x)))

strsplit

It will parse last_run and returns a list where each element is a character vector with sentences split in words

> strsplit(last_run, " ")
[[1]]
[1] "Last" "run"  "15"   "days" "ago" 

[[2]]
[1] "1st"   "up"    "after" "126"   "days" 

[[3]]
[1] "Last" "run"  "21"   "days" "ago" 

[[4]]
[1] "Last" "run"  "22"   "days" "ago" 

[[5]]
[1] "1st"   "up"    "after" "177"   "days" 

[[6]]
[1] "1st"   "up"    "after" "364"   "days" 

as.numeric

It will try to convert words in numbers and returns NA if it is not possible

> as.numeric(strsplit(last_run, " ")[[1]])
[1] NA NA 15 NA NA

na.omit

It will remove NA from vectors

na.omit(as.numeric(strsplit(last_run, " ")[[1]]))[[1]]
[1] 15

na.omit returns a list, and the vector without NA is the first element of the list (that is why, you need [[1]])


sapply

sapply applies a function on each element of a list and returns a vector

pietrodito
  • 1,783
  • 15
  • 24
1

You can take some help of regex. Extract the number which comes after the word 'run' or 'after'. Using base R sub :

as.numeric(sub('.*(run|after)\\s(\\d+).*', '\\2', last_run))
#[1]  15 126  21  22 177 364

Using stringr::str_extract :

as.numeric(stringr::str_extract(last_run, '(?<=(run|after)\\s)\\d+'))

data

last_run<-c('Last run 15 days ago','1st up after 126 days','Last run 21 days ago',
            'Last run 22 days ago','1st up after 177 days','1st up after 364 days')
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

you can extract with a regex the values and add them to a data.frame :

run = c('Last run 15 days ago','1st up after 126 days','Last run 21 days ago',
  'Last run 22 days ago','1st up after 177 days','1st up after 364 days')

as.numeric(sub("(.* )([[:digit:]]+)( .*)", '\\2', run))
Gowachin
  • 1,251
  • 2
  • 9
  • 17
1

In base R or in stringr::str_extract, put the pattern \\d+ between border markers \\b in order not to catch strings like "1st".

1. base R

gsub(".*(\\b\\d+\\b).*", "\\1", last_run)
#[1] "15"  "126" "21"  "22"  "177" "364"

as.integer(gsub(".*(\\b\\d+\\b).*", "\\1", last_run))
#[1]  15 126  21  22 177 364

2. package stringr

stringr::str_extract(last_run, "\\b\\d+\\b")
#[1] "15"  "126" "21"  "22"  "177" "364"

as.integer(stringr::str_extract(last_run, "\\b\\d+\\b"))
#[1]  15 126  21  22 177 364
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66