0

I'm having some problems in the following data frame:

    Treat.Name   HWAH
    P_Control_1  2918.000
    P_Control_2  2818.536
    P_Control_3  2619.036
    P_EMFL10_1   2740.786
    P_EMFL10_2   2616.893
    P_EMFL10_3   2395.964

I'm trying to break the character names in Treat.Name right at the "_" and create two new columns called "Cult" and "Num.", like in the example below:

 Cult   Treat.Name  Num.    HWAH
 P      Control     1       2918.000
 P      Control     2       2818.536
 P      Control     3       2619.036
 P      EMFL10      1       2740.786
 P      EMFL10      2       2616.893
 P      EMFL10      3       2395.964

I was searching for some examples of how to do that, but I'm not finding something close to what I'm looking for.

josliber
  • 43,891
  • 12
  • 98
  • 133
Luis Antolin
  • 144
  • 1
  • 7

2 Answers2

5

Try

library(splitstackshape)
 cSplit(df1, 'Treat.Name', '_')

Or

library(tidyr)
separate(df1, Treat.Name, into=c('Cult', 'Treat.Name', 'Num.'))

Or using base R

cbind(read.table(text=df1$Treat.Name,sep="_"), df1['HWAH'])
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    those packages did well what i was needing. Thanks for the help, it was really usefull! The cbind function didn't work very well. I think its because the column Treat.Name is called as a factor in my data.frame, so it is not readed as a text. – Luis Antolin Jun 02 '15 at 07:58
  • @LuisAntolin You could use `stringsAsFactors=FALSE` in the `read.table/read.csv` or in `data.frame` so that the character columns will not convert to 'factor' class. – akrun Jun 02 '15 at 08:23
  • 1
    Coercing the column as a character string, the cbind function works good. Thanks again! – Luis Antolin Jun 02 '15 at 08:25
5

another option, in base R:

newdf <- data.frame(do.call("rbind", strsplit(df$Treat.Name, "_")), df$HWAH, stringsAsFactors=F)
colnames(newdf) <- c("Cult", "Treat.Name", "Num.", "HWAH")

    newdf
   #             Cult Treat.Name Num.     HWAH
   # P_Control_1    P    Control    1 2918.000
   # P_Control_2    P    Control    2 2818.536
   # P_Control_3    P    Control    3 2619.036
   # P_EMFL10_1     P     EMFL10    1 2740.786
   # P_EMFL10_2     P     EMFL10    2 2616.893
   # P_EMFL10_3     P     EMFL10    3 2395.964

Or (as per @akruns comment) using the devel version of data.table you could simply do

## library(devtools)
## install_github("Rdatatable/data.table", build_vignettes = FALSE)

library(data.table) ## v >= 1.9.5
setDT(df)[, c('Cult', 'Treat.Name1', 'Num.') := tstrsplit(Treat.Name, '_')]
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Cath
  • 23,906
  • 5
  • 52
  • 86
  • The column Treat.Name is classified as a factor in my data frame. But if i coerce into a character string it works nice! Thank you very much, @CathG! =) – Luis Antolin Jun 02 '15 at 08:17