Shorter method to replace entries in R

Question

I have started learning R recently. Here's the source file I am working with (https://github.com/cosname/art-r-translation/blob/master/data/Grades.txt). Is there anyway I can change the letter grade from, say, A to 4.0, A- to 3.7 etc. without using the loop?

I am asking because if there were 1M entries, "for" loop might not be the most efficient way to modify the data. I would appreciate any help.

Since one of the posters told me to post my code, I thought of running the for loop to see whether I am able to do it. Here's my code:

mygrades<-read.table("grades.txt",header = TRUE)

i <- for (i in 1:nrow(mygrades))
{
  #print(i)  
  #for now, see whether As get replaced with 4.0.
  if(mygrades[i,1]=="A")
  {
    mygrades[i,1]=4.0
  }
  else if (mygrades[i,2]=="A")
  {
    mygrades[i,2]=4.0
  }
  else if (mygrades[i,3]=="A")
  {
    mygrades[i,3]=4.0
  }
  else
  {
    #do nothing...continues
  }

}

write.table(mygrades,"newgrades.txt")

However, the output is a little weird. For some "A"s, I get NA and others are left as it is. Can someone please help me with this code?

@alistaire, I did try Hadley's look-up table, and it works. I also looked at dplyr code, and it works well. However, for sake of my understanding, I'm still trying to use for loops. Please note that it has been about two days since I opened an R book. Here's the modified code.

#there was one mistake in my code: I didn't use stringsAsFactors=False.
#now, this code doesn't work for all "A"s. It spits out 4.0 for some As, and #doesn't do so for others. Why would that be?

mygrades<-read.table("grades.txt",header = TRUE,stringsAsFactors=FALSE)

i <- for (i in 1:nrow(mygrades))
{
  #print(i)  
  if(mygrades[i,1]=="A")
  {
    mygrades[i,1]=4.0
  }
  else if (mygrades[i,2]=="A")
  {
    mygrades[i,2]=4.0
  }
  else if (mygrades[i,3]=="A")
  {
    mygrades[i,3]=4.0
  }
  else
  {
    #do nothing...continues
  }

}

write.table(mygrades,"newgrades.txt")

The output is:

"final_exam" "quiz_avg" "homework_avg"
"1" "C" "4" "A"
"2" "C-" "B-" "4"
"3" "D+" "B+" "4"
"4" "B+" "B+" "4"
"5" "F" "B+" "4"
"6" "B" "A-" "4"
"7" "D+" "B+" "A-"
"8" "D" "A-" "4"
"9" "F" "B+" "4"
"10" "4" "C-" "B+"
"11" "A+" "4" "A"
"12" "A-" "4" "A"
"13" "B" "4" "A"
"14" "D-" "A-" "4"
"15" "A+" "4" "A"
"16" "B" "A-" "4"
"17" "F" "D" "A-"
"18" "B" "4" "A"
"19" "B" "B+" "4"
"20" "A+" "A-" "4"
"21" "4" "A" "A"
"22" "B" "B+" "4"
"23" "D" "B+" "4"
"24" "A-" "A-" "4"
"25" "F" "4" "A"
"26" "B+" "B+" "4"
"27" "A-" "B+" "4"
"28" "A+" "4" "A"
"29" "4" "A-" "A"
"30" "A+" "A-" "4"
"31" "4" "B+" "A-"
"32" "B+" "B+" "4"
"33" "C" "4" "A"

As you can see in the first row, the first A got recoded as 4, but the second A didn't get recoded. Any idea why this is happening?

Thanks in advance.

@rawr Where were you going with that? It's just a data frame with the key, not any actual substitutions, right? — Hack-R, Jul 16 '16 at 00:02
@rawr The code you posted. It doesn't do anything. I was trying to help you explain what it's supposed to be for? — Hack-R, Jul 16 '16 at 00:13
it's a general idea that can be used in many other ways. here is one: `grades <- as.matrix(read.table('https://raw.githubusercontent.com/cosname/art-r-translation/master/data/Grades.txt', header = TRUE)); un <- unique(c(grades)); key <- setNames(c(1:100, seq(un)), c(1:100, sort(un))); data.frame(matrix(key[grades], nrow(grades)))` — rawr, Jul 16 '16 at 00:14
@rawr OK, thanks. I haven't tested it but I assume that works and that's what I was trying to get you to explain. I don't think OP could've used it otherwise because if they are new enough to have this question it would be a big leap for them to have come up with that latter code on their own. Cheers. — Hack-R, Jul 16 '16 at 00:16

alistaire · Accepted Answer · 2016-07-16T00:47:54.767

2

A typical way in base R would be to make a named vector as a lookup table, e.g.

# data with fewer levels for simplicity
df <- data.frame(x = rep(1:3, 2), y = rep(1:2, 3))

lookup <- c(`1` = "A", `2` = "B", `3` = "C")

and subset it with each column:

data.frame(lapply(df, function(x){lookup[x]}))
##   x y
## 1 A A
## 2 B B
## 3 C A
## 4 A B
## 5 B A
## 6 C B

Alternately, dplyr recently added a recode function that's useful for such a job:

library(dplyr)

df <- read.table('https://raw.githubusercontent.com/cosname/art-r-translation/master/data/Grades.txt', header = TRUE)

df %>% mutate_all(funs(recode(., A = '4.0', 
                              `A-` = '3.7'))) %>%    # etc.
    as_data_frame()    # for prettier printing

## # A tibble: 33 x 3
##    final_exam quiz_avg homework_avg
##        <fctr>   <fctr>       <fctr>
## 1           C      4.0          4.0
## 2          C-       B-          4.0
## 3          D+       B+          4.0
## 4          B+       B+          4.0
## 5           F       B+          4.0
## 6           B      3.7          4.0
## 7          D+       B+          3.7
## 8           D      3.7          4.0
## 9           F       B+          4.0
## 10         39       C-           B+
## # ... with 23 more rows

edited Jul 16 '16 at 00:47

answered Jul 16 '16 at 00:18

alistaire

42,459
4
77
117

Thanks Alistaire and other posters. I have posted my code using "for" loop. I am getting quirky output. Do you think you could help me? I am fairly new to R, so I apologize for dumb questions. I'd appreciate any help you could offer. – watchtower Jul 16 '16 at 00:37
1

@watchtower This solution makes **heavy** use of `for` loops and other slow control flow statements. It just hides them within the source code of `recode` https://github.com/hadley/dplyr/blob/master/R/recode.R – Hack-R Jul 16 '16 at 00:49
1

@watchtower Honestly, the approach you have is going to take about 400 lines of code to write, and you'll probably make a typo at some point that will drive you crazy. You might start by checking out [an example of a lookup table that Hadley wrote](http://adv-r.had.co.nz/Subsetting.html#applications), and then see if you can sort out how the first example works, and what it would look like for your data (bigger, but smaller than `for` loops). The `dplyr` approach is a bit more high-level, but an interesting alternative. – alistaire Jul 16 '16 at 00:49

Shorter method to replace entries in R

1 Answers1

Linked

Related