-4

I have the following data frame,

>df
         X1          X2       X3    
 A   76 ± 2      76 ± 2   76 ± 2
 B   78 ± 2      76 ± 2   76 ± 2
 C   10 ± 2      76 ± 2   76 ± 2

I'm trying to convert it to ,

>df
     X1.mn  X1.sd    X2.mn   X2.mn    X3.mn   X3.sd 
 A   76     2        76      2        76      2
 B   78     2        76      2        76      2
 C   10     2        76      2        76      2

I tried using the stringr library referring to the posts here

df <- str_split_fixed(before$colnames(df)[1], intToUtf8(177), 2)

I get the following error though,

Error in stri_split_regex(string, pattern, n = n, simplify = simplify,  : 
  object 'before' not found

Any suggestions?

Natasha
  • 1,111
  • 5
  • 28
  • 66

3 Answers3

2

Here is an option using separate and purrr::map_dfc

library(tidyverse)
map_dfc(df, ~as.tibble(.x) %>% separate(value, c("val", "err")))
## A tibble: 3 x 6
#  val   err   val1  err1  val2  err2
#  <chr> <chr> <chr> <chr> <chr> <chr>
#1 76    2     76    2     76    2
#2 78    2     76    2     76    2
#3 10    2     76    2     76    2

I leave renaming the columns up to you.


Update

In response to your comment, you can use sep inside separate to specify the character(s) by which to separate columns.

df <- read.table(text =
    "X1          X2       X3
 A   '76.23 ± 2.23'      '76 ± 2'   '76 ± 2'
 B   '78.34 ± 2.23'      '76 ± 2'   '76 ± 2'
 C   '10.64 ± 2.23'      '76 ± 2'   '76 ± 2'", header = T)

library(tidyverse)
map_dfc(df, ~as.tibble(.x) %>% separate(value, c("val", "err"), sep = " ± "))
## A tibble: 3 x 6
#  val   err   val1  err1  val2  err2
#  <chr> <chr> <chr> <chr> <chr> <chr>
#1 76.23 2.23  76    2     76    2
#2 78.34 2.23  76    2     76    2
#3 10.64 2.23  76    2     76    2    

Update 2

To include rownames as a separate column

map_dfc(df, ~as.tibble(.x) %>% separate(value, c("val", "err"), sep = " ± ")) %>%
    mutate(row = rownames(df))
## A tibble: 3 x 7
#  val   err   val1  err1  val2  err2  row
#  <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 76.23 2.23  76    2     76    2     A
#2 78.34 2.23  76    2     76    2     B
#3 10.64 2.23  76    2     76    2     C

To include rownames as rownames

map_dfc(df, ~as.tibble(.x) %>% separate(value, c("val", "err"), sep = " ± ")) %>%
    data.frame(row.names = rownames(df))
#    val  err val1 err1 val2 err2
#A 76.23 2.23   76    2   76    2
#B 78.34 2.23   76    2   76    2
#C 10.64 2.23   76    2   76    2 

Sample data

df <- read.table(text =
    "X1          X2       X3
 A   '76 ± 2'      '76 ± 2'   '76 ± 2'
 B   '78 ± 2'      '76 ± 2'   '76 ± 2'
 C   '10 ± 2'      '76 ± 2'   '76 ± 2'", header = T)
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
  • Thank you. Could you please suggest how`map_dfc(df, ~as.tibble(.x) %>% separate(value, c("val", "err")))` can be modified when there are decimal values e.g `76.099889 + 2.098765`. Now what happens is, the number is split into `76` and `099889`. – Natasha Aug 24 '18 at 06:36
  • 1
    @Natasha You can use an explicit `sep` inside `separate`; I've updated my post, please take a look. – Maurits Evers Aug 24 '18 at 06:39
  • The row index is lost . A B C appears to be 1 2 3 .I tried adding`row.names =TRUE` to the end of `map_dfc` command. But didn't succeed in obtaining the row names in the output. Could you please let me know if there are other alternatives? – Natasha Aug 24 '18 at 07:23
  • 1
    @Natasha `tibble`s generally don't like rownames very much. You can however simply re-add rownames after separating & combining, see my second update. – Maurits Evers Aug 24 '18 at 09:05
2

Try

split_f<-function(x)
{
    a<-strsplit(x,"±")
    b<-unlist(a)
    df<-data.frame(x=b[seq(1,length(b),by=2)],y=b[seq(2,length(b),by=2)])
}
df1<-lapply(d1,split_f)
out<-do.call("cbind",df1)
names(out)<-sort(apply(expand.grid(names(d1), c("mn","sd")), 1, paste, collapse="."))
RSK
  • 751
  • 2
  • 7
  • 18
1

Here is a base R option with read.csv and lapply

data.frame(lapply(df, function(x) read.csv(text=paste(gsub("\\s*±\\s*", ",", x), 
            collapse="\n"), header=FALSE)))
#   X1.V1 X1.V2 X2.V1 X2.V2 X3.V1 X3.V2
#1    76     2    76     2    76     2
#2    78     2    76     2    76     2
#3    10     2    76     2    76     2

data

df <- structure(list(X1 = c("76 ± 2", "78 ± 2", "10 ± 2"), X2 = c("76 ± 2", 
 "76 ± 2", "76 ± 2"), X3 = c("76 ± 2", "76 ± 2", "76 ± 2"
 )), class = "data.frame", row.names = c("A", "B", "C"))
akrun
  • 874,273
  • 37
  • 540
  • 662