1

I am new to R and I need help with data cleaning.

In my data set (called "Survey") I want to join/merge/combine (however) two columns into one: the columns "Gender" and "Geschlecht" should be one column together, called "Sex".

I used the following command: Survey$Sex <- paste(Survey$Gender, "", Survey$Geschlecht)

And this it my outcome:

  Gender   Geschlecht        Sex 
1   NA          1           NA  1
2   NA          1           NA  1
3   NA          1           NA  1
4   NA          0           NA  0
5   NA          0           NA  0
6   NA          0           NA  0

I would like to remove/omit the NAs in the "Sex" column

Like this (desired outcome):

  Gender   Geschlecht      Sex 
1   NA          1           1
2   NA          1           1
3   NA          1           1
4   NA          0           0
5   NA          0           0
6   NA          0           0

How do I do this? :-) Please help!

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Lena
  • 11
  • 1

3 Answers3

2

You can also use dplyr's coalesce() function. Taking the example from GKi's answer:

library(dplyr)

Survey <- data.frame(Gender = c(NA,NA,0,1,0), Geschlecht = c(0,1,1,NA,NA))

Survey %>%
    mutate(Sex = coalesce(Gender, Geschlecht))
pedrostrusso
  • 388
  • 3
  • 10
1

You can use ifelse to select between the column Geschlecht or Gender.

Survey <- data.frame(Gender = c(NA,NA,0,1,0), Geschlecht = c(0,1,1,NA,NA))
Survey$Sex <- ifelse(is.na(Survey$Gender), Survey$Geschlecht, Survey$Gender)
Survey
#  Gender Geschlecht Sex
#1     NA          0   0
#2     NA          1   1
#3      0          1   0
#4      1         NA   1
#5      0         NA   0
GKi
  • 37,245
  • 2
  • 26
  • 48
  • Hi GKi, thank you for your answer. It works in R how you suggested but now only the possible events as shown in your answer come up when I want to review my dataset "Survey". How do I apply it on my real dataset? – Lena Dec 10 '19 at 12:20
  • @Lena Reread in your data `Survey` and just do `Survey$Sex <- ifelse(is.na(Survey$Gender), Survey$Geschlecht, Survey$Gender)` – GKi Dec 10 '19 at 12:23
  • Sorry for asking again, I am a bit confused: how do I get back after your suggested commands to my data set in which it should be all applied by then? – Lena Dec 10 '19 at 12:38
  • @Lena How did you get your data set called `Survey`? – GKi Dec 10 '19 at 12:42
  • I just named it like that – Lena Dec 10 '19 at 12:42
  • @Lena Do this again and you should have `Survey` like you had. – GKi Dec 10 '19 at 12:44
0

Base R solutions:

# 1. Keeping only the "Sex" Vector: 

Survey_clean <- within(Survey, 

                      {

                       Sex <- rowSums(replace(Survey, is.na(Survey), 0));

                       rm(Gender, Geschlecht)

                        }

                       )

# 2. Keeping all vectors: 

Survey$Sex <- rowSums(replace(Survey, is.na(Survey), 0))

Tidyverse solutions:

# Install pacakges if they are not already installed: 

necessary_packages <- c("dplyr")

# Create a vector containing the names of any packages needing installation: 

new_packages <- necessary_packages[!(necessary_packages %in% installed.packages()[,"Package"])]

# If the vector has more than 0 values, install the new pacakges
# (and their) associated dependencies: 

if(length(new_packages) > 0){

  install.packages(new_packages, dependencies = TRUE)

}

# Initialise the packages in the session: 

lapply(necessary_packages, require, character.only = TRUE)


#1. Keeping only the sex vector as the others are now redundant: 

Survey %>%
  transmute(Sex = coalesce(Gender, Geschlecht))

#2. Keeping all vectors:

Survey %>% 
  mutate(Sex = coalesce(Gender, Geschlecht))

Data thankyou @GKi:

Survey <- data.frame(Gender = c(NA,NA,0,1,0), Geschlecht = c(0,1,1,NA,NA))
hello_friend
  • 5,682
  • 1
  • 11
  • 15