adding variable column with factor levels to a dataframe

Question

I have an experiment with two nested factors. For example, gender(1,2) and condition (1,2), like:

    factor A factor B
    male     cond.1
    male     cond.2
    female   cond.1
    female   cond.2

Unfortunately the program I use to export the dependent variable values combines the factor levels in the header, for example

    male_cond.1, male_cond.2, female_cond.1, female_cond.2
    456        , 5654       , 566          , 456
       ...           ...          ...            ...

this is inconvenient because when I melt the data frame into the ANOVA appropriate long-format, I can no longer separate the data according to the different levels of the factor. It looks like this:

    1st column,    2nd column (DV)
    male_cond.1,   454
    male_cond.2,   5654
    female_cond.1, 566
    female_cond.2, 456

So how can I insert two new columns that are the length of however long the data frame is that repeat the values of my factors? The two columns should appear like:

    1st column (gender), 2nd column (condition),  
    male,                cond.1               
    male,                cond.2          
    femal,               cond.1         
    female,              cond.2            
      ...                 ...

My own data frame has four factors: electrode(63) x soa(2) x stimulstype(3) x itemtype(2). This is what my original data frame looks like:

    File Fp1.PD_ShortSOA_FAM Fp1.PD_LongSOA_FAM Fp1.PD_ShortSOA_SEMplus_REAL Fp1.PD_ShortSOA_SEMplus_FICT
    sub0001            0,446222          2,524,804            0,272959                    1,281,349
    sub0002           1,032,688          2,671,048           1,033,278                    1,217,817

And then this is what the transpose looks like:

    row.names                            V1         V2
    File                            sub0001    sub0002
    Fp1.PD_ShortSOA_FAM            0,446222  1,032,688
    Fp1.PD_LongSOA_FAM            2,524,804  2,671,048
    Fp1.PD_ShortSOA_SEMplus_REAL   0,272959  1,033,278
    Fp1.PD_ShortSOA_SEMplus_FICT  1,281,349  1,217,817
    Fp1.PD_ShortSOA_SEMminus_REAL  0,142739  1,405,100
    Fp1.PD_ShortSOA_SEMminus_FICT 1,515,577 -1,990,458

I would like my factor columns to appear as:

    electrode, SOA, stimulustype, itemtype
    Fp1.    ShortSOA  FAM            
    Fp1.    LongSOA   FAM           
    Fp1.    ShortSOA  SEMplus       REAL   
     ...       ...     ...           ...

I tried to use "strsplit" from this post and that didn't work.

Alex A. · Accepted Answer · 2015-06-18T16:46:01.047

0

Melting gets you almost there, you just need to parse the variable column into separate columns.

library(reshape2)

d <- transform(melt(yourdf, id = NULL),
               gender = gsub("_.*$", "", variable),
               condition = gsub("^[^_]*_", "", variable))

d
#        variable value   gender condition
# 1   male_cond.1   456     male    cond.1
# 2   male_cond.2  5654     male    cond.2
# 3 female_cond.1   566   female    cond.1
# 4 female_cond.2   456   female    cond.2

This uses regular expression replacement to get factor A (gender) by removing everything after the _ and factor B (condition) by removing everything before the _.

If you want the columns in a specific order, just do:

d <- transform(melt(yourdf, id = NULL),
               gender = gsub("_.*$", "", variable),
               condition = gsub("^[^_]*_", "", variable),
               DV = value)

d <- d[, -c(1, 2)]

d
#   gender condition   DV
# 1   male    cond.1  456
# 2   male    cond.2 5654
# 3 female    cond.1  566
# 4 female    cond.2  456

edited Jun 18 '15 at 16:46

answered Jun 18 '15 at 16:34

Alex A.

5,466
4
26
56

Thanks for the answer, though unfortunately It doesn't work when I adapt it to my own data and variables. – pd441 Jun 18 '15 at 20:43
@stevezissou: Then can you provide example data in the post that's more representative of what you actually have? – Alex A. Jun 18 '15 at 21:07
@stevezissou: Great, thanks. Are the variables character or numeric? (You can check with `str(yourdata)`.) – Alex A. Jun 19 '15 at 14:24
the first variable column (which is the edit is called "row.names" doesn't appear to be called by any function, for example p1t[1,1] returns "sub0001" and not "File", likewise str begins with the column V1 and not row.names. Do you know why this is? – pd441 Jun 19 '15 at 14:43
You can see exactly what I have here: for now I've been designing my factor columns in excel http://stackoverflow.com/questions/30940492/melt-function-converts-numeric-to-character-and-it-wont-convert-back-again – pd441 Jun 19 '15 at 16:11

adding variable column with factor levels to a dataframe

1 Answers1