0

I want to check a column or value for any punctuation except for periods .. I've looked at a bunch of the similar questions, but can't seem to get it right.

Desired output:

"1.0" FALSE
"-1.0" TRUE
"-1" TRUE
"1+" TRUE

Attempts:

> grepl("([.])[[:punct:]]", "1.0")
[1] FALSE
> grepl("([.])[[:punct:]]", "-1.0")
[1] FALSE
> grepl("(.)[[:punct:]]", "-1.0")
[1] TRUE
> grepl("(.)[[:punct:]]", "1.0")
[1] TRUE

Based R is preferred but required.

gabagool
  • 640
  • 1
  • 7
  • 18

2 Answers2

4

You can exclude . from [:punct:] with (?![.])[[:punct:]] or (?!\\.)[[:punct:]] using a negative lookahead,

x <- c("1.0", "-1.0", "-1", "1+")
grepl("(?![.])[[:punct:]]", x, perl=TRUE)
#[1] FALSE  TRUE  TRUE  TRUE

or use double negation, as given in the comments by @A5C1D2H2I1M1N2O1R2T1.

grepl("[^[:^punct:].]", x, perl=TRUE)
#[1] FALSE  TRUE  TRUE  TRUE

But being explicit and using for your given example [-+], [^[:digit:].] or [^0-9.] might be better,

grepl("[-+]", x)
#[1] FALSE  TRUE  TRUE  TRUE

grepl("[^[:digit:].]", x)
#[1] FALSE  TRUE  TRUE  TRUE

grepl("[^0-9.]", x)
#[1] FALSE  TRUE  TRUE  TRUE

as if a locale is in effect, it could alter the behaviour of [[:punct:]] and changing between perl=FALSE and perl=TRUE is altering it to.

gsub("[^[:punct:]]", "", intToUtf8(c(32:126, 160:255)), perl=FALSE)
#[1] "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~ ¡¢£¤¥¦§¨©«¬­®¯°±²³´¶·¸¹»¼½¾¿×÷"

gsub("[^[:punct:]]", "", intToUtf8(c(32:126, 160:255)), perl=TRUE)
#[1] "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"

See also: in R, use gsub to remove all punctuation except period, R regex remove all punctuation except apostrophe or Remove all punctuation except apostrophes in R.

GKi
  • 37,245
  • 2
  • 26
  • 48
3

Make it a two-step process. First remove periods, second detect (remaining) punctuation:

grepl("[[:punct:]]", gsub("\\.", "", x))

## use fixed = TRUE for a bit more speed in the gsub
grepl("[[:punct:]]", gsub(".", "", x, fixed = TRUE))
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294