I'm trying to automatize the following function for all columns of two data frames.
I have the data.frames position_sc and position_hit. Both have 18 columns (each column represents a subject ("PD.."). The rows represent the sentence numbers (sentence comprehension test with four conditions: sc, sf, lc, lf, total number of sentences 200) In position_sc the numbers in the rows represent the sentence numbers which had the condition sc (e.g. 3, 7, 19 --> these sentences were presented under condition sc) In position_hit the rows represent for which sentence of the 200 the subjects had a hit (e.g. 3, 22, 78 --> the subject gave the correct answer for these sentences). Now I want to get the sentences the subjects had correct (hit) in condition sc. So the sentence number are the same when they had a sc-sentence correct.
I want to have the number of equal values in position_hit$x and position_sc$x. I was able to do that for each column (=subject) seperately, but I can't figure out, how to automatize it over all columns (compare position_sc$PD02 with position_hit$PD02, position_sc$PD03 with position_hit$PD03, position_sc$PD04 with position_hit$PD04, position_sc$PD05 with position_hit$PD03, aso.
That works well:
test_single = length(intersect(position_sc$PD03, position_hit$PD03))
OUTPUT
[1] 40
--> (40 numbers are equal --> 40 sentences in condition sc were a hit)
I tried something like that:
#Create a list containing both data.frames.
list_position_sc_hit = list(position_hit, position_sc)
#Try to automatize intersect() first:
test_all = sapply(list_positions_sc_hit, function(x) intersect(position_sc$x, position_hit$x))
But OUTPUT:
$hit
NULL
$sc
NULL
Here is an example how my script looks like: REPRODUCIBLE EXAMPLE
#CREATE 2 DATAFRAMES
#Dataframe 1
df.sc = data.frame (PD02 = c(1,3,7,9,12),
PD03 = c(8,9,10,11,13),
PD04 = c(1,2,3,24,36),
PD05 = c(1,9,20,23,26))
df.sc
#Dataframe 2
#Create list of 4
list.hit = list(PD02 = c(1,3,7,8,11,13,14,15,16,17),
PD03 = c(2,4,6,10,11,13,18,19,21),
PD04 = c(1,2,3,4,6),
PD05 = c(18,19,20,23,25,26))
#Convert list with different number of rows into data.frame
df.hit = as.data.frame(t(stri_list2matrix(list.hit)))
#Rows and columns are switched
df.hit = data.frame(t(df.hit))
#Name columns like in df.sc
names(df.hit)[1:4] = names(df.sc)[1:4]
#Convert all values in data.frame from chr to int
df.hit = sapply(df.hit, as.integer)
# After converting from chr to int the data.frame became an array. In order to reshape it I had to switch row and column twice (not sure why)
df.hit = data.frame(t(df.hit))
df.hit = data.frame(t(df.hit))
df.hit
#Do intersection and length counting for one subject (column)
matches.hit_sc_PD02 = length(intersect(df.sc$PD02, df.hit$PD02))
matches.hit_sc_PD02
OUTPUT: [3] --> three integers are the same in df.sc$PD02 and df.hit$PD02
#Combine both dataframes into list and rename dataframes
df.hit_sc = list(df.hit, df.sc)
names(df.hit_sc) = c("hit", "sc")
df.hit_sc
#Just try intersect() function for all variables (columns) with sapply(). I left length() out just to minimize complexity.
matches.hit_sc_all = sapply(df.hit_sc, function(x) intersect(df.sc$x, df.hit$x))
matches.hit_sc_all
OUTPUT: $hit NULL
$sc NULL
I would be happy if anyone could help me.