1

I'm trying to tidy my data in my R Script so that I can run some statistical analyses on the tidied data set.

One of the columns lists pairs (6 of these), which correspond to three separate "blocks" of output values. The minimal reproducible dataset is below.

dput(head(data, 6)) 
structure(list(pairs = c("ABC", "ACB", "BAC", "BCA", "CBA", "CAB"), block1vals = c(1, 3, 5, 7, 9, 10), block2vals = c(4, 66, 34, 66, 21, 21), block3vals = c(53, 22, 12, 65, 21, 22)), .Names = c("pairs", "block1vals", "block2vals", "block3vals"), row.names = c(NA, 6L), class = "data.frame")

I got my code to take the pairs and label each participant's A/B/C value for a given block, a column for each block; this works:

Block 1:

data$block1types <- sapply(data$pairs, function(x){
  if(x == "ABC") { return("Type A")}
  if(x == "ACB") { return("Type A")}
  if(x == "BAC") { return("Type B")}
  if(x == "BCA") { return("Type B")}
  if(x == "CBA") { return("Type C")}
  if(x == "CAB") { return("Type C")}
})

Block 2:

data$block2types <- sapply(data$pairs, function(x){
  if(x == "ABC") { return("Type B")}
  if(x == "ACB") { return("Type C")}
  if(x == "BAC") { return("Type A")}
  if(x == "BCA") { return("Type C")}
  if(x == "CBA") { return("Type B")}
  if(x == "CAB") { return("Type A")}
})

Block 3:

data$block3types <- sapply(data$pairs, function(x){
 if(x == "ABC") { return("Type C")}
if(x == "ACB") { return("Type B")}
if(x == "BAC") { return("Type C")}
if(x == "BCA") { return("Type A")}
if(x == "CBA") { return("Type A")}
if(x == "CAB") { return("Type B")}
})

What I am trying to do is to now reorganize the data so that there is a column with all "Type A" participant values (doesn't matter which block A was in) as well as one for "Type B" and one for "Type C."

So the ideal output is:

data$TypeA <- c(1, 3, 34, 65, 21, 21)
data$TypeB <- c(4, 22, 5, 7, 21, 22)
data$TypeC <- c(53, 66, 12, 66, 9, 10)

I cannot figure out how to do this without making problems. My attempt to do so was this, creating two columns outside the data set, which I hoped I could then spread:

BlockTypes<- combine(data$block1types, data$block2types, data$block3types, .id = NULL)     
BlockTotals<- combine(data$block1vals, data$block2vals, data$block3vals, .id = NULL) 

I then tried to do this:

spread(data, key= BlockTypes, value=BlockTotals, fill = 0)

This failed: var must evaluate to a single number or a column name, not a character vector. I do think, though, that the bigger problem was putting the columns outside the data set. I couldn't use the spread function with them, since they were outside of the data set. So I am a bit stuck on how to do this, if the combine function cannot be used with a tibble.

andler22
  • 13
  • 4
  • 1
    Instead of all those `if-statements` you can extract the first (or second or third) letter and paste it after *type* to generate your desired columns. That would be way more efficient. – M-- Jan 07 '19 at 22:08
  • 3
    But about your question, we need to see a minimal reproducible dataset and an example desired output on that dataset. Please provide a [reproducible example in r](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). The link I provided, will tell you how. Moreover, please take the [tour](https://stackoverflow.com/tour) and visit [how to ask](https://stackoverflow.com/help/how-to-ask). Cheers. – M-- Jan 07 '19 at 22:09

2 Answers2

1

I'm sure there is a nicer way to do this if I put my mind to it but here's something working.

First we used the substr function to pull out the first, second, and third characters for your types. I used the paste function to include the "Type of" part with the extraction we performed. This is much nicer than doing each combination like you have.

Next we went through the data 3 times (once for every type). Each time we went through the data, we used the block types to see if we should pull the block value.

library(tidyverse)
data <- tibble(
pairs = c("ABC", "ACB", "BAC", "BCA", "CBA", "CAB"),
block1vals = c(1, 3, 5, 7, 9, 10),
block2vals = c(4, 66, 34, 66, 21, 21),
block3vals = c(53, 22, 12, 65, 21, 22)
)

data %>%
  mutate(
    block1types = paste0("Type ",substr(pairs, 1, 1)),
    block2types = paste0("Type ",substr(pairs, 2, 2)),
    block3types = paste0("Type ",substr(pairs, 3, 3))) %>%
  mutate(
    TypeAValues = case_when(
    block1types == "Type A" ~ block1vals,
    block2types == "Type A" ~ block2vals,
    block3types == "Type A" ~ block3vals)) %>%
  mutate(
    TypeBValues = case_when(
    block1types == "Type B" ~ block1vals,
    block2types == "Type B" ~ block2vals,
    block3types == "Type B" ~ block3vals)) %>%
  mutate(
    TypeCValues = case_when(
    block1types == "Type C" ~ block1vals,
    block2types == "Type C" ~ block2vals,
    block3types == "Type C" ~ block3vals))
Sahir Moosvi
  • 549
  • 2
  • 21
0

Here's an approach that makes use of the dplyr and stringr packages.

library(dplyr)
library(stringr)

data %>%
  # For each letter, determine the position of that letter in the entry in the 'pairs' column
  mutate(a = str_locate(pairs, 'A')[,'start'],
         b = str_locate(pairs, 'B')[,'start'],
         c = str_locate(pairs, 'C')[,'start']) %>% 
  # Based on the letter's position, pull the value from the appropriate column
  mutate_at(.vars = vars(a, b, c),
            .funs = funs(case_when(. == 1 ~ block1vals,
                                   . == 2 ~ block2vals,
                                   . == 3 ~ block3vals)))

The reason for the odd-looking call to str_locate() is that the output of calling str_locate() is a matrix.

Here's what the output of the function looks like:

pairs <- c('ABCDE')
str_locate(pairs, 'BC')

     start end
[1,]     2   3

To return just the position of the letter 'B', you would want to extract the column titled start from the matrix.

You can combine the call to str_locate() with the column extraction by writing the following:

str_locate(pairs, 'BC')['start']

bschneidr
  • 6,014
  • 1
  • 37
  • 52