Combining 2 columns in R prioriziting one of them

Question

I know nothing of R, and I have a data.frame with 2 columns, both of them are about the sex of the animals, but one of them have some corrections and the other doesn't.

My desired data.frame would be like this:

    id  sex  father mother birth.date  farm
    0    1    john   ray   05/06/94     1
    1    1    doug   ana   18/02/93     NA
    2    2    bryan  kim   21/03/00     3

But i got to this data.frame by using merge on 2 others data.frames

  id sex.x father mother birth.date sex.y farm
    0    2    john   ray    05/06/94    1     1
    1    1    doug   ana    18/02/93    NA    NA
    2    2    bryan  kim    21/03/00    2     3

data.frame 1 or Animals (Has the wrong sex for some animals)

 id sex father mother birth.date
  0  2   john   ray    05/06/94
  1  1   doug   ana    18/02/93
  2  2   bryan  kim    21/03/00

data.frame 2 or Farm (Has the correct sex):

 id farm sex
  0  1    1
  2  3    2

The code i used was: Animals_Farm <- merge(Animals , Farm, by="id", all.x=TRUE)

I need to combine the 2 sex columns into one, prioritizing sex.y. How do I do that?

Welcome to R. Check this highly tagged Q. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. Then try to provide details of input data, what is your desired output. — user5249203, Oct 06 '16 at 20:11
This would be a good task for `coalesce` from package dplyr. — aosmith, Oct 06 '16 at 20:23
How i use this sex <- ? I tried like this ifelse(!is.na(Animais_Farm$sex.y), Animais_Farm$sex.y, Animais_Farm$sex.x) — Jean Nunes, Oct 07 '16 at 14:01
So you need this `merge(df, df1[,-3], by="id", all.x=TRUE)`? — Sotos, Oct 07 '16 at 14:12
@Sotos With that line, farm didnt appeared, but sex.x and sex.y yes — Jean Nunes, Oct 07 '16 at 14:20
oh ok. I get it... I think you simply need (given that your merged data frame is called `df2`), then `df2$sex.x <- ifelse(is.na(df2$sex.y), df2$sex.x, df2$sex.y); df2$sex.y <- NULL` — Sotos, Oct 07 '16 at 14:29

score 0 · Answer 1 · answered Oct 06 '16 at 20:33

If I correctly understand you example you have a situation similar to what I show below based on the example from the merge function.

> (authors <- data.frame(
      surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
      nationality = c("US", "Australia", "US", "UK", "Australia"),
      deceased = c("yes", rep("no", 3), "yes")))

   surname nationality deceased
1    Tukey          US      yes
2 Venables   Australia       no
3  Tierney          US       no
4   Ripley          UK       no
5   McNeil   Australia      yes

> (books <- data.frame(
      name = I(c("Tukey", "Venables", "Tierney",
                 "Ripley", "Ripley", "McNeil", "R Core")),
      title = c("Exploratory Data Analysis",
                "Modern Applied Statistics ...", "LISP-STAT",
                "Spatial Statistics", "Stochastic Simulation",
                "Interactive Data Analysis",
                "An Introduction to R"),
      deceased = c("yes", rep("no", 6))))

      name                         title deceased
1    Tukey     Exploratory Data Analysis      yes
2 Venables Modern Applied Statistics ...       no
3  Tierney                     LISP-STAT       no
4   Ripley            Spatial Statistics       no
5   Ripley         Stochastic Simulation       no
6   McNeil     Interactive Data Analysis       no
7   R Core          An Introduction to R       no

> (m1 <- merge(authors, books, by.x = "surname", by.y = "name"))

   surname nationality deceased.x                         title deceased.y
1   McNeil   Australia        yes     Interactive Data Analysis         no
2   Ripley          UK         no            Spatial Statistics         no
3   Ripley          UK         no         Stochastic Simulation         no
4  Tierney          US         no                     LISP-STAT         no
5    Tukey          US        yes     Exploratory Data Analysis        yes
6 Venables   Australia         no Modern Applied Statistics ...         no

Where authors might represent your first dataframe and books your second and deceased might be the value that is in both dataframe but only up to date in one of them (authors).

The easiest way to only include the correct value of deceased would be to simply exclude the incorrect one from the merge.

> (m2 <- merge(authors, books[names(books) != "deceased"], 
               by.x = "surname", by.y = "name"))

   surname nationality deceased                         title
1   McNeil   Australia      yes     Interactive Data Analysis
2   Ripley          UK       no            Spatial Statistics
3   Ripley          UK       no         Stochastic Simulation
4  Tierney          US       no                     LISP-STAT
5    Tukey          US      yes     Exploratory Data Analysis
6 Venables   Australia       no Modern Applied Statistics ...

The line of code books[names(books) != "deceased"] simply subsets the dataframe books to remove the deceased column leaving only the correct deceased column from authors in the final merge.

The problem is that the dataframe Animals has more animals than dataframe Farm, so i cant just remove sex.x — Jean Nunes, Oct 07 '16 at 13:34
This is why a reproducible example that demonstrates the challenges you are trying to overcome is important. In that case, I would recommend the solution in the comments below the question by @rawr — Barker, Oct 11 '16 at 00:31

Combining 2 columns in R prioriziting one of them

1 Answers1