0

I would like to expand on question: Find the index of the column in data frame that contains the string as value

I have data

data<-data.frame(expert=c("class.1","class.4","class.2"),
  choice1=c("class.3","class.8","class.10"),
  score1=c(0.92,0.91,0.30),
  choice2=c("class.1","class.7","class.9"),
  score2=c(0.70,0.78,0.30),
  choice3=c("class.6","class.1","class.2"),
  score3=c(0.01,0.58,0.30),
  stringsAsFactors=FALSE
)

I would like to get the score associated with the expert choice. The goal is to find out if the 1.) choice one is correct, but I need to check if there are scores that the code chosen by the expert is tied.

So in the example data, using dplyr:

data %>% mutate(Right=expert==choice1)

get part of the answer, but doesn't handle ties. The answer in Find the index of the column in data frame that contains the string as value uses grepl, which I don't think can handle vectors of regex patterns.

I've tried max, max.col, and which, alone and in combination with rowwise(), but I just cant seem to get the right answer. I've also made the data "tidy" using reshape (thanks UCLA IDRE http://stats.idre.ucla.edu/r/faq/how-can-i-reshape-my-data-in-r/), but I was unable to filter the data appropriately.

tidydata <- reshape(data,varying =list(paste0("choice",1:3),
  paste0("score",1:3)),direction="long",v.names=c("choice","score"))  %>%
  arrange(id) %>% filter(expert==choice)

I know the column of the expert choice, but lose the connection to the choice.1

The best solution would have a function that returns a factor (right, tie, wrong), where row 3 would return tie.

Edit: This data is comparing the results of a classifier to a human annotator. The classifier can sometimes yield tied results (the score for 2 or more classes are the same). I am trying to identify when the classifier is correct (choice1==expert), but not tied (I call this RIGHT); Tied (when classes selected by the expert and the classifier have the same score, but are not the same code I call this TIE); otherwise the classify is WRONG. Thank you

Community
  • 1
  • 1

1 Answers1

0

Well, I only can think of using ifelse statements;

data %>% mutate(TF1 = choice1 == expert,
            TF2 = choice2 == expert,
            TF3 = choice3 == expert,
            TFs1 = score2 == score1,
            TFs2 = score3 == score1,
            Decision = ifelse(TF1==TRUE, "Right",
                           ifelse(TF2 == TRUE & TFs1 == TRUE | TF3==TRUE & TFs2 == TRUE, "Tie", "Wrong")))

This might not be the one that you look for, but will work as you explained.

Mons2us
  • 192
  • 1
  • 1
  • 9
  • Sorry if I wasn't clear. For the first row, the expert chose class.1, My first choice (choice1) was class.3 so I was wrong. In the second row, the expert chose class.4, my first choice was class.8, so I was wrong. The third row, the expert chose class.2, choice.1,choice,2 and choice.3 are tied. so the answer isn't wrong, it's tied. The correct answer should be (false, false, tied) – user7845541 Apr 10 '17 at 15:57
  • What about the first row? you have class.1 as your second choice, but why is it 'wrong'? shouldn't that also be tie? – Mons2us Apr 10 '17 at 16:19
  • The score for the second choice is not equal to the score for the first choice, – user7845541 Apr 10 '17 at 16:20
  • I simplified the example. I have about 1000 classes. I couldn't do it this way. – user7845541 Apr 10 '17 at 17:59