Using stringr in R to split numbers

Question

I have the following data frame,

>df
         X1          X2       X3    
 A   76 ± 2      76 ± 2   76 ± 2
 B   78 ± 2      76 ± 2   76 ± 2
 C   10 ± 2      76 ± 2   76 ± 2

I'm trying to convert it to ,

>df
     X1.mn  X1.sd    X2.mn   X2.mn    X3.mn   X3.sd 
 A   76     2        76      2        76      2
 B   78     2        76      2        76      2
 C   10     2        76      2        76      2

I tried using the stringr library referring to the posts here

df <- str_split_fixed(before$colnames(df)[1], intToUtf8(177), 2)

I get the following error though,

Error in stri_split_regex(string, pattern, n = n, simplify = simplify,  : 
  object 'before' not found

Any suggestions?

At least change the dataframe name. In that post it is called as `before`, in your example it is `df` — Ronak Shah, Aug 24 '18 at 05:58
Please use the output of `dput(df)` to share a reproducible example of your data.frame. — Roland, Aug 24 '18 at 06:00
And `before$colnames(df)` is not valid R code. Nor is `df$colnames(df)`. A valid form could be `df[colnames(df)]` but this is equal to, well, `df`. — Rui Barradas, Aug 24 '18 at 06:01
@RuiBarradas It could be valid if `before` was a list with an element `colnames` containing a function. — Roland, Aug 24 '18 at 06:13

Maurits Evers · Accepted Answer · 2018-08-24T09:02:55.860

Here is an option using separate and purrr::map_dfc

library(tidyverse)
map_dfc(df, ~as.tibble(.x) %>% separate(value, c("val", "err")))
## A tibble: 3 x 6
#  val   err   val1  err1  val2  err2
#  <chr> <chr> <chr> <chr> <chr> <chr>
#1 76    2     76    2     76    2
#2 78    2     76    2     76    2
#3 10    2     76    2     76    2

I leave renaming the columns up to you.

Update

In response to your comment, you can use sep inside separate to specify the character(s) by which to separate columns.

df <- read.table(text =
    "X1          X2       X3
 A   '76.23 ± 2.23'      '76 ± 2'   '76 ± 2'
 B   '78.34 ± 2.23'      '76 ± 2'   '76 ± 2'
 C   '10.64 ± 2.23'      '76 ± 2'   '76 ± 2'", header = T)

library(tidyverse)
map_dfc(df, ~as.tibble(.x) %>% separate(value, c("val", "err"), sep = " ± "))
## A tibble: 3 x 6
#  val   err   val1  err1  val2  err2
#  <chr> <chr> <chr> <chr> <chr> <chr>
#1 76.23 2.23  76    2     76    2
#2 78.34 2.23  76    2     76    2
#3 10.64 2.23  76    2     76    2

Update 2

To include rownames as a separate column

map_dfc(df, ~as.tibble(.x) %>% separate(value, c("val", "err"), sep = " ± ")) %>%
    mutate(row = rownames(df))
## A tibble: 3 x 7
#  val   err   val1  err1  val2  err2  row
#  <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 76.23 2.23  76    2     76    2     A
#2 78.34 2.23  76    2     76    2     B
#3 10.64 2.23  76    2     76    2     C

To include rownames as rownames

map_dfc(df, ~as.tibble(.x) %>% separate(value, c("val", "err"), sep = " ± ")) %>%
    data.frame(row.names = rownames(df))
#    val  err val1 err1 val2 err2
#A 76.23 2.23   76    2   76    2
#B 78.34 2.23   76    2   76    2
#C 10.64 2.23   76    2   76    2

Sample data

df <- read.table(text =
    "X1          X2       X3
 A   '76 ± 2'      '76 ± 2'   '76 ± 2'
 B   '78 ± 2'      '76 ± 2'   '76 ± 2'
 C   '10 ± 2'      '76 ± 2'   '76 ± 2'", header = T)

Thank you. Could you please suggest how`map_dfc(df, ~as.tibble(.x) %>% separate(value, c("val", "err")))` can be modified when there are decimal values e.g `76.099889 + 2.098765`. Now what happens is, the number is split into `76` and `099889`. — Natasha, Aug 24 '18 at 06:36
@Natasha You can use an explicit `sep` inside `separate`; I've updated my post, please take a look. — Maurits Evers, Aug 24 '18 at 06:39
The row index is lost . A B C appears to be 1 2 3 .I tried adding`row.names =TRUE` to the end of `map_dfc` command. But didn't succeed in obtaining the row names in the output. Could you please let me know if there are other alternatives? — Natasha, Aug 24 '18 at 07:23
@Natasha `tibble`s generally don't like rownames very much. You can however simply re-add rownames after separating & combining, see my second update. — Maurits Evers, Aug 24 '18 at 09:05

RSK · Answer 2 · 2018-08-24T06:51:06.507

2

Try

split_f<-function(x)
{
    a<-strsplit(x,"±")
    b<-unlist(a)
    df<-data.frame(x=b[seq(1,length(b),by=2)],y=b[seq(2,length(b),by=2)])
}
df1<-lapply(d1,split_f)
out<-do.call("cbind",df1)
names(out)<-sort(apply(expand.grid(names(d1), c("mn","sd")), 1, paste, collapse="."))

edited Aug 24 '18 at 06:51

answered Aug 24 '18 at 06:43

RSK

751
2
7
18

score 1 · Answer 3 · answered Aug 24 '18 at 14:12

Here is a base R option with read.csv and lapply

data.frame(lapply(df, function(x) read.csv(text=paste(gsub("\\s*±\\s*", ",", x), 
            collapse="\n"), header=FALSE)))
#   X1.V1 X1.V2 X2.V1 X2.V2 X3.V1 X3.V2
#1    76     2    76     2    76     2
#2    78     2    76     2    76     2
#3    10     2    76     2    76     2

data

df <- structure(list(X1 = c("76 ± 2", "78 ± 2", "10 ± 2"), X2 = c("76 ± 2", 
 "76 ± 2", "76 ± 2"), X3 = c("76 ± 2", "76 ± 2", "76 ± 2"
 )), class = "data.frame", row.names = c("A", "B", "C"))

Using stringr in R to split numbers

3 Answers3

Update

Update 2

Sample data

data