1

I have a matrix which includes 4 and six digit numbers, which basically comprise 2 or 3 pairs of digits, describing overlapping shapes. So, for example,

data1<-cbind(474440,470000,440000,40000,404400,474000).

Each cell of the matrix has either a 47, a 44, a 40, or some combination of the above, and the rest of the number is zeros. I have another data set which is similar, but only has two pairs of numbers, not three. So, for example,

data2<-cbind(5253,5200,5300,50000,5053).

Again, this combination contains 52,53,50, or some combination thereof. I would like to be able to select a logical matrix for each one of the two digit numbers, so selecting 40 in data1 would yield (TRUE,FALSE,FALSE,TRUE,TRUE,TRUE), and selecting 50 in data2 would yield (FALSE, FALSE, FALSE, TRUE, TRUE). I have tried creating a list of the unique two digit numbers I'm looking for, and using grepl to select those that match the pattern, but because of the zeros that are in the matrix which represent empty values, grepl selects too many of the cells; for example, looking for 40 in data1 would yield (TRUE, FALSE, TRUE,TRUE, TRUE, TRUE).

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
Alexandra
  • 25
  • 8
  • 1
    I think you are making this too difficult on yourself. how about if you put delimiters between the pairs, such as "-" so that you dont get a false match? or replace the 00 values with xx? – Eccountable Jan 16 '14 at 06:56
  • This question about [splitting a string into substrings](http://stackoverflow.com/questions/11619616/how-to-split-a-string-into-substrings-of-a-given-length) may be of use to you. – thelatemail Jan 16 '14 at 07:11
  • Warning: remember that the `grep, gsub` family of functions coerce numbers to characters. For example, `data<-474.4400e7; grepl('444',data)` will return `TRUE` (unless you've done something bad with `format`, as Sven hinted at). – Carl Witthoft Jan 16 '14 at 14:25
  • Thanks, that's very helpful. I did originally use options (scipen) but format is obviously much better! – Alexandra Jan 16 '14 at 20:48

2 Answers2

2
#   ...........      look for 40 in .......   split string into pairs
apply(data1,2,function(x)40 %in% strsplit(gsub("([[:alnum:]]{2})", "\\1 ", x), " ")[[1]])

[1]  TRUE FALSE FALSE  TRUE  TRUE  TRUE    
Troy
  • 8,581
  • 29
  • 32
  • I figured there was a way to do this with some sort of string split function, but I couldn't quite get my head around it. Thanks for the suggestion. – Alexandra Jan 16 '14 at 23:23
1

You can use grepl with the correct regular expression. The function format is necessary to avoid scientific notation of numbers.

data1<-cbind(474440,470000,440000,400000,404400,474000)

grepl("^(..)*40", format(data1))
# [1]  TRUE FALSE FALSE  TRUE  TRUE  TRUE


data2<-cbind(5253,5200,5300,5000,5053)

grepl("^(..)*50", format(data2))
# [1] FALSE FALSE FALSE  TRUE  TRUE

How it works?

In the regular expression ^(..)*40, ^ is the beginning of the string. (..) is of group of two characters. The quantifier * means 0 or multiple times. The 40 just means literal 40. Hence this matches 40 preceded by exactly zero, two, four etc. characters.

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168