dplyr to reference two data frame (summarize function) in R

Question

I created a data frame from a data set with unique marketing sources. Let's say I have 20 unique marketing sources in this new data frame D1. I want to add another column that has the count of times this marketing source was in my original data frame. I'm trying to use the dplyr package but not sure how to reference more than one data frame.

original data has 16000 observations new data frame has 20 observations as there are only 20 unique marketing sources. How to use summarize in dplyr to reference two data frames? My objective is to find the percentage of marketing sources.

My original data frame has two columns: NAME, MARKETING_SOURCE This data frame has 16,000 observations and 20 distinct marketing sources (email, event, sales call, etc) I created a new data frame with only the unique MARKETING_SOURCES and called that data frame D1 In my new data frame, I want to add another column that has the number of times each marketing source appeared in the original data frame. My new Data frame should have two columns: MARKETING_SOURCE, COUNT

How about creating a simple [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to give some idea of what your data looks like. Provide sample input and desired output. — MrFlick, Feb 26 '15 at 21:12
My original data frame has two columns: NAME, MARKETING_SOURCE This data frame has 16,000 observations and 20 distinct marketing sources (email, event, sales call, etc) I created a new data frame with only the unique MARKETING_SOURCES and called that data frame D1 In my new data frame, I want to add another column that has the number of times each marketing source appeared in the original data frame. My new Data frame should have two columns: MARKETING_SOURCE, COUNT — statsR, Feb 26 '15 at 21:36
Did you read the link i provided? You have not added any sample data. There is nothing for us to test out possible solutions with. It sounds like you just need a group_by() and a summarize() (any maybe a join?) but without a concrete example, it's not easy to tell. — MrFlick, Feb 26 '15 at 21:38
Your reproducible sample data does NOT equal your actual data or a subset of it. You make a fictitious data that easily shows what you have and what you want. Doing this myself solves most of my problems. It makes you look differently at your problem. I would also down vote your question since you didn't provide code that shows what you have, what your tried and where you want to be besides words you must provide code. — mtelesha, Mar 04 '15 at 20:36

Richard Border · Answer 1 · 2015-02-26T21:44:11.597

I don't know if you need to use dplyr for something like this...

First let's create some data.frames:

df1 <- data.frame(source = letters[sample(1:26, 400, replace = T)])
df2 <- data.frame(source = letters, count = NA)

Then we can use table() to get the frequencies:

counts <- table(df1$source)
df2$count <- counts
head(df2)
  source count
1      a    10
2      b    22
3      c    12
4      d    17
5      e    18
6      f    18

UPDATE:

In response to @MrFlick's wise comment below, you can use take the names() of the output from table() to ensure order is preserved:

df2$source <- names(counts)

Certainly not quite as elegant and would be even less elegant if df2 had other columns. But sufficient for the simple case presented above.

You should be careful here because the order returned by `table()` may not always be the same order as the values in `df2`. — MrFlick, Feb 26 '15 at 21:39

dplyr to reference two data frame (summarize function) in R

1 Answers1