-4

I have two datasets in R as you can see below, Dataset 1 and Dataset 2. Consider the column 'company' in Dataset 1. I want to iterate through every row in Dataset 2 to find if that company exists, and if it does, then copy the format from Dataset 2 into Dataset 1 format field (currently NA). Thanks.

Dataset 1

enter image description here

Dataset 2

enter image description here

dataset1_updated after implementing Miff's solution

enter image description here

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
Programmer
  • 1,266
  • 5
  • 23
  • 44
  • 1
    You could make some dummy data that shows the problem without having to go through the process of censoring. Also you could make it so we can cut and paste to get your data for us. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Spacedman Jul 06 '17 at 11:33
  • @Spacedman sorry about that, I'll edit my question with some dummy data. – Programmer Jul 06 '17 at 11:35
  • Both my and @Miff 's answer offer a solution to your problem. Did you try either of them yet? – JAD Jul 06 '17 at 11:53
  • @Jarko Dubbeldam reimplementing them now. It didn't work earlier but most likely that's something wrong on my side. Thanks for your solution. And you too Miff. – Programmer Jul 06 '17 at 11:54
  • 1
    If an answer helped you, upvote and/or accept it. – Hong Ooi Jul 06 '17 at 12:09
  • @HongOoi Hi Hong, there's a few minor errors at the moment but yes I will accept one of them. – Programmer Jul 06 '17 at 12:12

2 Answers2

1

This seems to be a classic case for merge, for example:

#Get rid of the existing format column
dataset1 <- dataset1[,names(dataset1 != "format")

#merge
dataset1_updated <- merge(dataset1, dataset2[,c("company","format")], all.x=TRUE)

Not tested due to the lack of example data

Miff
  • 7,486
  • 20
  • 20
  • Hi Miff! A given company field in Dataset 1 may or may not exist in Dataset 2 and is in no particular order. For example, let's take the company in row 1 of Dataset1. I need to iterate through every row in Dataset2 to find if that company exists, and if so, then get its format (e.g. first.last) and enter that into the row 1 format field. – Programmer Jul 06 '17 at 11:34
  • @novice Yes, that's what merge does. the `all.x` argument says to keep all the rows of dataset1, even when there isn't a match in dataset2. Try it and see, or look at the help file with `?merge` (you may also want to try the `sort=FALSE` argument) – Miff Jul 06 '17 at 11:39
1

You want to join these two datasets together. data.table is really good at this.

library(data.table)

setDT(dataset1)
setDT(dataset2)

setkey(dataset1, company)
setkey(dataset2, company)

dataset2[dataset1[, .(fname, lname, email, company)]]

Left joins

Left outer join

The result of a left outer join (or simply left join) for tables A and B always contains all rows of the "left" table (A), even if the join-condition does not find any matching row in the "right" table (B). This means that if the ON clause matches 0 (zero) rows in B (for a given row in A), the join will still return a row in the result (for that row)—but with NULL in each column from B. A left outer join returns all the values from an inner join plus all values in the left table that do not match to the right table, including rows with NULL (empty) values in the link column.

Wikipedia.

Here we are left joining dataset2 on dataset1. We keep all rows from dataset1, and join dataset2 if company from dataset1 matches company from dataset2. If it matches, we keep dataset2's row and add it, including the format field. If it doesn't match, we get NA.

JAD
  • 2,035
  • 4
  • 21
  • 35
  • Hi Jarko! A given company field in Dataset 1 may or may not exist in Dataset 2 and is in no particular order. For example, let's take the company in row 1 of Dataset1. I need to iterate through every row in Dataset2 to find if that company exists, and if so, then get its format (e.g. first.last) and enter that into the row 1 format field. – Programmer Jul 06 '17 at 11:34
  • @novice I know. Thats what this does. Or at least is supposed to do. – JAD Jul 06 '17 at 11:36
  • @novice see my explanation of what is going on. – JAD Jul 06 '17 at 11:41
  • @Miff does the same thing in his answer, just using a different function. – JAD Jul 06 '17 at 11:41
  • Hi guys, I updated the question with dummy data to explain my problem more concisely. Sorry for the ambiguity earlier. – Programmer Jul 06 '17 at 11:50
  • @Miff Hi Miff, I tried your solution, but the dataset1_updated isn't working as expected. I've updated my question to include dataset1_updated, thanks. – Programmer Jul 06 '17 at 12:05