Add a value to each row from another dataframe of unequal length

Question

I have the following datasets:

dataset1:

Class    Value
Yo       53
Save     13
Gold     72
Post     88

dataset2:

Class   Total_goals
Yo       9
Yo       9
Yo       9
Save     4
Save     4
Gold     7
Gold     7
Gold     7
Gold     7
Post     3
Post     3

What I want is to add the Total_goals for each class in the dataset1 from the second dataset.

The expected output will be:

Class    Value     Total_goals
Yo       53        9
Save     13        4
Gold     72        7
Post     88        3

How can I do that?

`merge(x = dataset1, y = dataset2, all.x = TRUE)` is pretty standard. If you know all Classes from dataset1 are in dataset2, `merge(dataset1, dataset2)`. — Gregor Thomas, Jan 18 '19 at 16:25
Suggested duplicate: [How to join/merge data in R](https://stackoverflow.com/q/1299871/903061) — Gregor Thomas, Jan 18 '19 at 16:26
By `add` do you mean you want to add a new column? What is your expected output? — Ronak Shah, Jan 18 '19 at 16:27
Perhaps `merge(df1, unique(df2[c("Class", "Value")]), all.x = TRUE)` in case there are additional `df2` columns OP isn't mentioning... — Gregor Thomas, Jan 18 '19 at 16:36

Devon Oliver · Answer 1 · 2019-01-18T18:19:37.317

0

Use cbind, it will work regardless if you have a matching variable between datasets or not. This does assume that observation levels are the same (i.e., order) between dataframes.

Create your dataframes:

dataset1 = data.frame(c("yo","save","gold", "post"),c(53,13,72,88))
colnames(dataset1) = c("Class","Value")

dataset2 = data.frame(c("yo","save","gold", "post"),c(9,4,7,3))
colnames(dataset2) = c("Class","Total_goals")

Answer:

dataset1 = cbind(dataset1, dataset2$Total_goals)
colnames(dataset1) = c("Class","Value","Total_goals")

*Edited to reflect additional info (i.e., duplicate info in the second dataframe), requires matching variable *

Solution if dataframes are of unequal length with one containing duplicate data.

Create your dataframes:

dataset1= data.frame(c("yo","save","gold", "post"),c(53,13,72,88))
colnames(dataset1) = c("Class","Value")

dataset2 = data.frame(c("yo","save","gold", 
"post","post","gold"),c(9,4,7,3,3,7))
colnames(dataset2) = c("Class","Total_goals")

Answer:

dataset1$Total_goal = dataset2[match(dataset1$Class, dataset2$Class),2]
colnames(dataset1) = c("Class","Value","Total_goals")

edited Jan 18 '19 at 18:19

answered Jan 18 '19 at 16:17

Devon Oliver

295
1
7
20

2

Makes a very dangerous assumption that the rows are in the same order in both data sets. – Gregor Thomas Jan 18 '19 at 16:24
"dangerous" is a bit of an overstatement, but an assumption, yes. As long as the observation levels correspond in both data sets you could use the arrange function from dplyr to sort them into the same order. Edited answer to reflect assumption. – Devon Oliver Jan 18 '19 at 16:28
It doesn't work when there are duplication in the second dataset. I added more information to give you more explanation. – Adam Amin Jan 18 '19 at 16:32
2

I appreciate you editing the assumption into your question. I guess I would correct my earlier statement to be a "dangerous recommendation" rather than a "dangerous assumption" - not stating a strong assumption is dangerous. Now that the assumption is stated, it's a solution that explains when it can be used, so no longer dangerous. – Gregor Thomas Jan 18 '19 at 16:38
@Adam Amin , I have edited the answer to include a solution that will fulfill the additional requirements resulting from duplication in the second dataset. Hopefully, this meets your needs. – Devon Oliver Jan 18 '19 at 18:16
@AdamAmin if the edited answer solves your problem, could you please accept it. – Devon Oliver Jan 23 '19 at 02:08

Add a value to each row from another dataframe of unequal length

1 Answers1