I have a messy and confusing long-format data set (I have started using R very recently and could not master it yet so I need some guidance).
My participants went through different phases in an experiment. In phase a, they rated images. In phase b they saw some images with different affects. In phase c, they rated the images they saw in phase b. I can retrieve all responses, affect information, and images that the participants rated through separate columns. My aim is to analyze responses according to the image affects as (no-affect, positive, negative) and I want to know image numbers corresponding to each response.
The problem is when the phase is over the last value inserted is copied onto the following rows (so should be omitted) and for some columns I have NAs as there is no value above that the program copies.
A simplified version of this dataset looks like this:
> df
id phase phase.a.response phase.c.response phase.a.pic
1 1 a 1 NA x.jpg
2 1 a 2 NA y.jpg
3 1 a 3 NA z.jpg
4 1 a 10 NA d.jpg
5 1 b 10 NA d.jpg
6 1 b 10 NA d.jpg
7 1 b 10 NA d.jpg
8 1 b 10 NA d.jpg
9 1 c 10 5 d.jpg
10 1 c 10 4 d.jpg
11 1 c 10 2 d.jpg
12 1 c 10 1 d.jpg
phase.b.pic pic.affect phase.c.pic
1 <NA> <NA> <NA>
2 <NA> <NA> <NA>
3 <NA> <NA> <NA>
4 <NA> <NA> <NA>
5 m.jpg positive <NA>
6 n.jpg negative <NA>
7 p.jpg positive <NA>
8 r.jpg negative <NA>
9 r.jpg negative n.jpg
10 r.jpg negative p.jpg
11 r.jpg negative r.jpg
12 r.jpg negative m.jpg
data$response[data$phase=="a"]<-data$phase.a.response
data$response[data$phase=="b"]<-data$phase.b.response
I tried to create a new variable like the one above but did not work due to the NAs (or because my code does not make sense).
Ideally I want to be able to subset my data according to the phases and I want my responses in one column, the phase in one column (which I already have in the data), corresponding images in one column and corresponding image affects in another column (for phase a should have no affect).