1

I printed out the summary of a column variables as such:

summary(document$subject)

summary(document$subject)

A,B,C,D,E,F,.. are the subjects belonging to a column of a data.frame where A,B,C,...appear many times in the column, and the summary above shows the number of times (frequency) these subjects have appeared in the file. Also, the term "OTHER" refers to those subjects which have appeared only once in the file, I also need to assign "1" to these subjects.

There are so many different subjects that it's difficult to list out all of them if we use command "c".

I want to build up a new column (or data.frame) and then assign these corresponding numbers (scores) to the subjects. Ideally, it will become this in the file:

A    198
B    113
C    96
D    69
A    198
E    65
F    62
A    198
C    113
BZ   21
BC    1
CJ    1

...

I wonder what command I should use to take the scores/values from the summary table and then build a new column to assign these values to the corresponding subjects in the file.

Plus, since it's a summary table printed by R, I don't know how to build it into a table in a file, or take out the values and subject names from the table. I also wonder how I could find out the subject names which appeared only once in the file, so that the summary table added them up into "OTHER".

emilliman5
  • 5,816
  • 3
  • 27
  • 37
Susie
  • 41
  • 3

1 Answers1

4

Your question is hard to interpret without a reproducible example. Please take a look this threat for tips on how to do that:

How to make a great R reproducible example?

Having said that, here is how I interpret your question. You have two data frames, one with a score per subject and another with the subjects multiple times in a column:

Sum <- data.frame(subject=c("A","B"),score=c(1,2))
foo <- data.frame(subject=c("A","B","A"))

> Sum
  subject score
1       A     1
2       B     2
> foo
  subject
1       A
2       B
3       A

You can then use match() to match the subjects in one data frame to the other and create the new variable in the second data frame:

foo$score <- Sum$score[match(foo$subject, Sum$subject)]

> foo
  subject score
1       A     1
2       B     2
3       A     1
Community
  • 1
  • 1
Sacha Epskamp
  • 46,463
  • 20
  • 113
  • 131
  • Thank you very much for the message:) It's very helpful, you interpreted my question quite well. But there is one more point I need to add or I'm not sure about. Since there are so many subjects that it's so difficult to list out manually, also since the subject appear repeatedly in the column, I wonder how I can use a command to list out these subjects as you showed above? Thank you! – Susie Jul 09 '11 at 00:23