0

I'm currently working in a psychology lab and beginning data analysis on response time data from a task.

The task itself goes on multiple trials and this makes the data disorganized to look at - especially more so now since my initial job was to merge all the data into a single data frame. In vertical orientation of the data, we can see the participate ID and the response time. Great, those are important bits of information..however, instead of seeing trial numbers and such, we just see the data represented as this:

Participant 1, 23
Participant 1, 22
Participant 1, 25
Participant 2, 36

It goes on like that repeating participant ID's (our sample size goes well into the thousands, so our data frame is very long). We can't pick out the important info nor see which trial is which. So, we want a horizontal representation.

Now, I am using R as means for data analysis... but I am a bit new to R and this is my first project with it. While I have done online R courses, you really learn it best when working with actual data.

In efforts to fix my data I have been looking into the packages reshape and tidyr: reshape has melt and cast which could help me and tidyr has pivot_wider which I think could help me more than melt and cast.

I have been playing around with both using a smaller data frame from my actual data as a means of testing out code.

pivot_wider

I used pivot_wider at first:

newdf1_test %>%
        pivot_wider(names_from = name, values_from = V4)

I got a tibble but it was only had one of the participant's ID and one response time value

I also got a warning message stating that values in V4 are not uniquely defined and I was given suggestions on how to bypass the warning. All of the suggestions just returned to me the error in a data from with replacement having 1 row and data having 0. What does this mean exactly?

melt and cast

I'm just not sure how this works yet. When I melt the data frame I'm not sure what to do afterward because all I see is still a long data frame as opposed to wide.

melt_testdf <- melt(newdf1_test, name = c("SID", V4 = c("response_time")

I was under the impression that this would add two new variables: SID and response_time which would help me make two different data tables and then transpose them in order to make the merged data frame horizontal. But, no, the new data frame returned to me was showed the name (which has the participants ID), variable with just the value V4 (V4 was the name for the column that had response time originally), and value which was the column response time ended up being.

I know I am supposed to cast in order to reshape the data further, but seeing as that I'm already confused I don't want to proceed.

What am I to do? I'm so confused by this right now and no matter how much I read I am not getting anywhere with this.

Community
  • 1
  • 1
  • 2
    Please show sample data consistent with attempted code and desired result. How do you determine *trial*? BTW - long data is the preferred format of most data science operations (aggregation, modeling, plotting, etc.). In fact, it is consider [tidy data](https://r4ds.had.co.nz/tidy-data.html)! – Parfait Dec 02 '19 at 20:21
  • Between you and me, this is the format that you actually your data when doing exploratory analysis, because it is considered tidydata. Especially if you are going to be pushing into r to do analysis and do tests such as regression, anova, etc. – Hansel Palencia Dec 02 '19 at 20:21
  • You should be able to do a `spread()` as long as you have unique participant id's for each participant and the trial id, i.e. trial1, trial2, something to that effect. – Hansel Palencia Dec 02 '19 at 20:23
  • Please add a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to your question. – IceCreamToucan Dec 02 '19 at 20:27
  • @HanselPalencia that's the thing about being a volunteer RA fresh out of university: the senior researchers already know what they want done and don't really have time to tell you. I'm not entirely sure why they want horizontal data, then again I never knew that vertical was considered Tidy. I've tried spread but I get the unique combination of keys error. I don't have the trial information, but I do have the participant ID which helps. I'm just not sure how to make it work. –  Dec 02 '19 at 20:38

1 Answers1

2

The error you're seeing is because pivot_wider assumes there is only one row, so it needs a way to aggregate the V4 results.

If you want to have multiple rows you would need to supply an argument or data that will let the new wide table have a meaningful way to designate new rows.

Here's an example where I've supplied an id for the new table:


newdf1_test <- tribble(
        ~test, ~name, ~V4,
        '001', 'Participant 1', 23,
        '002','Participant 1', 22,
        '003','Participant 1', 25,
        '001','Participant 2', 36)


newdf1_test %>%
        pivot_wider(
                names_from = name, 
                values_from = V4)

# A tibble: 3 x 3
  test  `Participant 1` `Participant 2`
  <chr>           <dbl>           <dbl>
1 001                23              36
2 002                22              NA
3 003                25              NA

Essentially in this version the cols() argument for pivot_wider is implicit with the test variable. And also you can see that the new data table makes sense in a way that it wouldn't if it didn't have the test variable.

I hope that makes sense!

JFlynn
  • 344
  • 2
  • 8
  • It does make sense! The data I am using to test this code is actually for just one participant and I just realized that. So. there are 330 trials and this helped a lot with building a new data frame which shows everything I need. I see where I ran into trouble - lack of the test. Now, if I were to use this method would I have to go through all 330 trials and rewrite them in the code? Also, would this code still work the same if I were working with one participant like I am now with the test data frame? Because this solution helps me in the long run with the complete data. –  Dec 02 '19 at 21:21
  • Well if the 'test' identifier is arbitrary, i.e. if you don't really care about the value of some variable called test, then you don't need to manually write out that info. You can add the line mutate(test = row_number()) Or even put in a random value generator ... And then yes, this would still work with one participant. Though at that point, you could just rename V4 as 'Participant 1'; same results, possible efficiency gain. – JFlynn Dec 02 '19 at 21:41