Average numbers in character string in R

Question

I have a column in a data frame that looks something like this:

[1] [0.50 .. 0.52] [0.52 .. 0.54] [0.54 .. 0.56] [0.56 .. 0.58] [5] [0.58 .. 0.60] [0.60 .. 0.62] [0.62 .. 0.64] [0.64 .. 0.66] [9] [0.66 .. 0.68] [0.68 .. 0.70] [0.70 .. 0.72] [0.72 .. 0.74] [13] [0.74 .. 0.76] [0.76 .. 0.78] [0.78 .. 0.80] [0.80 .. 0.82]

I would like to take the average of the two numbers in this column. However, I don't know how to go about this. I tried using gsub() to replace the " .. " but I cannot remove the brackets, and I cannot seem to find a way to just extract the numbers. What would be the best way to just get the average of these numbers?

Could you give the result of the function `dput` on your data for us to work on it ? Thanks — Pop, Jul 19 '12 at 10:56

score 4 · Answer 1 · answered Jul 19 '12 at 11:02

4

You can use the base functions for regex in R (gsub, regexp, ...) or the stringr package to that (str_extract).

require(stringr)

string <- c("[0.50 .. 0.52]", "[0.52 .. 0.54]", "[0.54 .. 0.56]", "[0.56 .. 0.58]")

number <- as.numeric(str_extract(string, "\\d\\.\\d+"))
number
[1] 0.50 0.52 0.54 0.56

You can then compute the mean between 1 and 2, 3 and 4, using for the rollmean function in zoo

require(zoo)
average <- rollmean(number, 2)
average[as.logical(seq_along(average) %%2 )]
[1] 0.51 0.55

answered Jul 19 '12 at 11:02

dickoa

18,217
3
36
50

Thanks, that works perfectly! Just curious though, what does "\\d\\.\\d+" do? – user1537589 Jul 19 '12 at 11:11
"\\d\\.\\d+" is a regular expression for decimal number ( 8.6767 or 7.5 works but not 70.45). Example here : http://stackoverflow.com/questions/308122/simple-regular-expression-for-a-decimal-with-a-precision-of-2 – dickoa Jul 19 '12 at 11:19

score 4 · Accepted Answer · answered Jul 19 '12 at 11:09

4

Use gsub to take out the brackets (remembering to double escape for them), then use strsplit to separate the numbers, and sapply to work on the resulting list with mean and as.numeric:

x <- c("[0.52 .. 0.54]", "[0.54 .. 0.56]")

sapply(strsplit(gsub("[\\[\\]]","",x,perl=T)," .. "),function(x) mean(as.numeric(x)))
[1] 0.53 0.55

answered Jul 19 '12 at 11:09

James

65,548
14
155
193

(+1) Exactly the approach I would have taken. – A5C1D2H2I1M1N2O1R2T1 Jul 19 '12 at 11:12
Actually, not exactly--I had done `[^[:digit:]. ]` for my `gsub` instead. – A5C1D2H2I1M1N2O1R2T1 Jul 19 '12 at 11:19
Nice approach...short, compact and good use of the list structure – dickoa Jul 19 '12 at 11:20

Andrie · Answer 3 · 2012-07-19T11:27:33.160

Use gsub to replace the special characters with spaces. Then strsplit and take the mean:

First Replicate the data:

x <- scan(what="character", quote='"', sep=" ", text='"[0.50 .. 0.52]" "[0.52 .. 0.54]" "[0.54 .. 0.56]" "[0.56 .. 0.58]" "[0.58 .. 0.60]" "[0.60 .. 0.62]" "[0.62 .. 0.64]" "[0.64 .. 0.66]" "[0.66 .. 0.68]" "[0.68 .. 0.70]" "[0.70 .. 0.72]" "[0.72 .. 0.74]" "[0.74 .. 0.76]" "[0.76 .. 0.78]" "[0.78 .. 0.80]" "[0.80 .. 0.82]"')

Then use gsub with sapply and mean:

xx <- gsub("\\[|\\.\\.|\\]", "", x)
sapply(strsplit(xx, "  "), function(x)mean(as.numeric(x)))

The results:

 [1] 0.51 0.53 0.55 0.57 0.59 0.61 0.63 0.65 0.67 0.69 0.71 0.73 0.75 0.77 ...

The regular expression works like this:

The brackets [ ] means replace any text inside the brackets
You want to replace brackets [ and ] but since these have meaning in regex, you need to escape these, i.e. \\[ and \\]
Finally the | means the same as logical OR, i.e. find me brackets OR double periods

You can read more about regular expresions in R at ?regexp or ?gsub.

@user1537589 I have edited the answer. Does it now make sense? — Andrie, Jul 19 '12 at 11:16

Average numbers in character string in R

3 Answers3