0

I have a column in a data frame that looks something like this:

[1] [0.50 .. 0.52] [0.52 .. 0.54] [0.54 .. 0.56] [0.56 .. 0.58]
[5] [0.58 .. 0.60] [0.60 .. 0.62] [0.62 .. 0.64] [0.64 .. 0.66]
[9] [0.66 .. 0.68] [0.68 .. 0.70] [0.70 .. 0.72] [0.72 .. 0.74]
[13] [0.74 .. 0.76] [0.76 .. 0.78] [0.78 .. 0.80] [0.80 .. 0.82]

I would like to take the average of the two numbers in this column. However, I don't know how to go about this. I tried using gsub() to replace the " .. " but I cannot remove the brackets, and I cannot seem to find a way to just extract the numbers. What would be the best way to just get the average of these numbers?

  • 1
    Could you give the result of the function `dput` on your data for us to work on it ? Thanks – Pop Jul 19 '12 at 10:56

3 Answers3

4

You can use the base functions for regex in R (gsub, regexp, ...) or the stringr package to that (str_extract).

require(stringr)

string <- c("[0.50 .. 0.52]", "[0.52 .. 0.54]", "[0.54 .. 0.56]", "[0.56 .. 0.58]")

number <- as.numeric(str_extract(string, "\\d\\.\\d+"))
number
[1] 0.50 0.52 0.54 0.56

You can then compute the mean between 1 and 2, 3 and 4, using for the rollmean function in zoo

require(zoo)
average <- rollmean(number, 2)
average[as.logical(seq_along(average) %%2 )]
[1] 0.51 0.55
dickoa
  • 18,217
  • 3
  • 36
  • 50
  • Thanks, that works perfectly! Just curious though, what does "\\d\\.\\d+" do? – user1537589 Jul 19 '12 at 11:11
  • "\\d\\.\\d+" is a regular expression for decimal number ( 8.6767 or 7.5 works but not 70.45). Example here : http://stackoverflow.com/questions/308122/simple-regular-expression-for-a-decimal-with-a-precision-of-2 – dickoa Jul 19 '12 at 11:19
4

Use gsub to take out the brackets (remembering to double escape for them), then use strsplit to separate the numbers, and sapply to work on the resulting list with mean and as.numeric:

x <- c("[0.52 .. 0.54]", "[0.54 .. 0.56]")

sapply(strsplit(gsub("[\\[\\]]","",x,perl=T)," .. "),function(x) mean(as.numeric(x)))
[1] 0.53 0.55
James
  • 65,548
  • 14
  • 155
  • 193
3

Use gsub to replace the special characters with spaces. Then strsplit and take the mean:

First Replicate the data:

x <- scan(what="character", quote='"', sep=" ", text='"[0.50 .. 0.52]" "[0.52 .. 0.54]" "[0.54 .. 0.56]" "[0.56 .. 0.58]" "[0.58 .. 0.60]" "[0.60 .. 0.62]" "[0.62 .. 0.64]" "[0.64 .. 0.66]" "[0.66 .. 0.68]" "[0.68 .. 0.70]" "[0.70 .. 0.72]" "[0.72 .. 0.74]" "[0.74 .. 0.76]" "[0.76 .. 0.78]" "[0.78 .. 0.80]" "[0.80 .. 0.82]"')

Then use gsub with sapply and mean:

xx <- gsub("\\[|\\.\\.|\\]", "", x)
sapply(strsplit(xx, "  "), function(x)mean(as.numeric(x)))

The results:

 [1] 0.51 0.53 0.55 0.57 0.59 0.61 0.63 0.65 0.67 0.69 0.71 0.73 0.75 0.77 ...

The regular expression works like this:

  • The brackets [ ] means replace any text inside the brackets
  • You want to replace brackets [ and ] but since these have meaning in regex, you need to escape these, i.e. \\[ and \\]
  • Finally the | means the same as logical OR, i.e. find me brackets OR double periods

You can read more about regular expresions in R at ?regexp or ?gsub.

Andrie
  • 176,377
  • 47
  • 447
  • 496