In R, why does this call to gather() do this?

Question

Here's a reproducible example, with my explanation of why it does what it does.

data = read.csv(text="Email foo.final bar.final
abc@foo.com 100       200
cde@foo.com 101       201
xyz@foo.com 102       202
zzz@foo.com 103       103", header=T, sep="" )

a = gather(data, key, Grade, -Email)

means: Except "Email", put the values of all the columns into a single new column called "Grade" and add a new column called "key" which contains the column header under which the value occurred. Given that we have 4 observations with two variables each, that should produce 8 observations. Result:

        Email       key Grade
1 abc@foo.com foo.final   100
2 cde@foo.com foo.final   101
3 xyz@foo.com foo.final   102
4 zzz@foo.com foo.final   103
5 abc@foo.com bar.final   200
6 cde@foo.com bar.final   201
7 xyz@foo.com bar.final   202
8 zzz@foo.com bar.final   103

b = gather(data, key, Grade)

Same meaning but now we include Email. Now we have 4 observations but with 3 variables, so we should get 12 observations. Result:

         key       Grade
1      Email abc@foo.com
2      Email cde@foo.com
3      Email xyz@foo.com
4      Email zzz@foo.com
5  foo.final         100
6  foo.final         101
7  foo.final         102
8  foo.final         103
9  bar.final         200
10 bar.final         201
11 bar.final         202
12 bar.final         103

I am not surprised.

What were you expecting? Which part doesn't make sense to you? I'm not really sure what you are asking. Also, it's much better if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) when asking a question. A `str()` is less helpful than a `dput()`. Even better use a built in data set or a minimal, simple example. — MrFlick, Jun 02 '17 at 13:49
This is very much expected right? You are gathering all variables (not escaping email) which coerces numeric/character values in a single column — timfaber, Jun 02 '17 at 13:51
Are all of the emails actually the same? If not then you might want to reconsider the example you're giving since it doesn't represent your use case. — Dason, Jun 02 '17 at 13:55
Fair enough, I am rewriting the whole question using a reproducible example and explaining what is surprising to me. — pitosalas, Jun 02 '17 at 15:42

score 0 · Answer 1 · answered Jun 02 '17 at 14:03

0

You may need to do something more like this

f2 <- f1 %>% 
      gather(key = Assignment, value = Grade, COURSE.final:EXAM.final) %>%
      select(-email)

answered Jun 02 '17 at 14:03

Matt Jewett

3,249
1
14
21

In R, why does this call to gather() do this?

1 Answers1