I want to create a new column in my database according to information from another database

Question

I have two separate databases (Database_1 and Database_2) I want to add the height profile from Database_2 to Database_1 as a new column in Database_1.

Database_1:

Horse_type
Stallion
Race_horse
Work_horse
Work_horse

Database_2:

Horse_type   Height_profile
Stallion     Large
Race_horse   Medium
Work_horse   Small
Pure_breed   Huge

So far I've only tried to do this using a for loop.

for (row in 1:nrow(Database_1)) {

if(Database_1$Horse_type == Database_2$Horse_type) {

Database_1$New_Column <- Database_2$height_profile
 }
}

I expect the output:

Database_1:

Horse_type   Height_profile
Stallion     Large
Race_horse   Medium
Work_horse   Small
Work_horse   Small

But the actual output is:

"There were 50 or more warnings (use warnings() to see the first 50)"

nycrefugee · Answer 1 · 2019-04-10T18:27:21.403

1

A loop seems like a complicated way to do this. You could just merge the two data frames by the common column Horse_type and the values will be added as a new column:

database_1 <- merge(database_1, database_2, by = "Horse_type")

edited Apr 10 '19 at 18:27

answered Apr 10 '19 at 18:21

nycrefugee

1,629
1
10
23

score 0 · Answer 2 · answered Apr 10 '19 at 18:00

Your loop has a couple issues. The first thing that jumps out at me is that you create row to represent each element in the list you are going to loop over, but never call it in your actual loop. That might be something to look into next time... Anyway this works:

   #create dataframes
    df1 <- as.data.frame(list(Horse_type= c("Stallion",
                                            "Race_horse",
                                            "Work_horse",
                                            "Work_horse")
                              ), stringsAsFactors = F,
                         )

    df2 <- as.data.frame(list(Horse_type= c("Stallion",
                                            "Race_horse",
                                            "Work_horse",
                                            "Pure_breed"),
                              Height_profile= c("Large",
                                                "Medium",
                                                "Small",
                                                "Huge")
                              ), stringsAsFactors = F,
                         )
    #initialize empty column to capture output of loop iteratively  
    New_column <- NULL
    for (i in 1:nrow(df1)) {
      New_column[i] <- df2$Height_profile[
        which(df1$Horse_type[i] == df2$Horse_type)
        ]
    }
    #attach output of loop as a variable to df1
    df1$height <- New_column

Note this was particularly easy because you don't have any repeat cases in `df1`, AND within each `Horse_type` you had only one invariant height_profile. Because of this matching consistency, you could improve the loop to enable it to at least handle duplicates of `df1$Horse_type` by adding a `[1]` at the end of the `which` search, to extract only one of the matches (since any of the matches will correspond to the correct `Height_profile`. — Dij, Apr 10 '19 at 18:05

Rushabh Patel · Answer 3 · 2019-04-10T18:27:57.233

You can use data.table package -

> setkey(database1,"Horse_type")
> setkey(database2,"Horse_type")
> setDT(database2)[setDT(database1),]

 Horse_type Height_profile
1: Race_horse         Medium
2:   Stallion          Large
3: Work_horse          Small
4: Work_horse          Small

OR

> merge(database1,database2)

  Horse_type Height_profile
1 Race_horse         Medium
2   Stallion          Large
3 Work_horse          Small
4 Work_horse          Small

I want to create a new column in my database according to information from another database

3 Answers3