-1

This has to be answered somewhere, but I can't seem to find it. I'm trying to identify a string of numbers, specifically, a string that does not contain any symbols (i.e. _ , . # etc..). How do I write an expression that identifies strings of integers longer than 10 but excludes anything that contains a symbol?

"49154" "Reader  #1" 0.585069444444445 28 "LA" "982" "000088261962" "01/29/10" "14:02:30"   1
"49159" "Reader  #1" 0.585081018518519 28 "LA" "982" "000088261962" "01/29/10" "14:02:31"   1
"49160" "Reader  #2" 0.585127314814815 28 "LA" "982" "000088261962" "01/29/10" "14:02:35"   1
"49163" "Reader  #2" 0.585138888888889 28 "LA" "982" "000088261962" "01/29/10" "14:02:36"   1

I tried something like: grep("[0-9]{10,20}") but I'd like to identify column #8 but exclude #4

sc73
  • 135
  • 1
  • 9
  • 4
    Maybe "^[0-9]{10,}$" – HubertL Aug 26 '16 at 22:27
  • 2
    `dput` the data you are working with – rawr Aug 26 '16 at 22:29
  • This site solves all my regex problems http://regexr.com/ . By the way, I have a nice non-regex answer for this in R but your title and tags say regex so I don't dare post it as an answer : / – Hack-R Aug 26 '16 at 22:48
  • I don't know what _r_ regex is capable of but here are a few things: If it's surrounded by quotes `"[0-9]{10,20}"`, if it's the eighth column (separated by spaces) `^(?:[^ ][ ]+){7}[ ]+[^0-9 ]*[0-9]{10,20}[^0-9]`, or no symbol as delimiters `[^0-9,._][0-9]{10,20}[^0-9,._]` –  Aug 27 '16 at 01:38

1 Answers1

1

Personally, for your specific case (a string of integers of length 10 or greater), I'd go with something like this:

\d{10,}

Or, if R's regex engine doesn't support \d,

[0-9]{10,}

If you want to match an optional decimal value before the string, you can use this:

([0-9]+\.)?[0-9]{10,}

Remember, always use the most specific pattern possible for the strings you want to match. The more generic the pattern, the more headaches you'll have down the line trying to filter out strings you don't want.

Sebastian Lenartowicz
  • 4,695
  • 4
  • 28
  • 39