3

I have quite a large dataset that has multiple missing and duplicate values. My first aim is to created a new column (Name) that consists of three existing columns, e.g. FirstName, MiddleInitial, and LastName.

I have tried:

owners4$Name <- paste(owners4$FirstName, owners4$MiddleInitial, owners4$LastName)

but this results in NAs being pasted as characters instead of just NAs. After this I was then going to delete every row that has a NA in my new column.

Is anyone sure of a way I can achieve this?

zx8754
  • 52,746
  • 12
  • 114
  • 209
sall
  • 49
  • 4
  • 1
    it will be good if you can provide a reproducible example. your problem looks easy but we need to look at the data 1st. use dput() – Hunaidkhan Oct 11 '18 at 06:25
  • You might want to have a look at [this post](https://stackoverflow.com/questions/13673894/suppress-nas-in-paste) if you want to remove NA's. – Ronak Shah Oct 11 '18 at 06:31
  • put didn't give me any useable data I could give to you as my dataset consists of 15 variables and 65,000 observations. Is there a way to use dput to just view a small portion of the dataset?I am new to R and not really sure of how to tackle this – sall Oct 11 '18 at 06:56
  • @sall Yes, try `dput(head(owners4))`, and copy paste it into your post, click on "edit" to edit your post. – zx8754 Oct 11 '18 at 07:05

2 Answers2

5

Paste columns using na.omit, see example:

# reproducible example
owners4 <- data.frame(FirstName = c("Aa", "Bb", NA),
                      MiddleInitial = c("T", "U", NA),
                      LastName = c(NA, "Yyy", NA))

owners4$Name <- apply(owners4[, c("FirstName", "MiddleInitial", "LastName")], 1,
                      function(i){ paste(na.omit(i), collapse = " ") })

owners4
#   FirstName MiddleInitial LastName     Name
# 1        Aa             T     <NA>     Aa T
# 2        Bb             U      Yyy Bb U Yyy
# 3      <NA>          <NA>     <NA>         

Now filter out rows where Name is blank

result <- owners4[ owners4$Name != "", ]
result
#   FirstName MiddleInitial LastName     Name
# 1        Aa             T     <NA>     Aa T
# 2        Bb             U      Yyy Bb U Yyy
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • Thank you, that worked perfectly. Now I just need to find a way to delete the rows that have NAs in that column – sall Oct 11 '18 at 07:50
  • 1
    All good, I found this code that worked: owners <- owners[!is.na(owners$Name),]. Thanks for the help :) – sall Oct 11 '18 at 08:12
1

Here is one approach using a vector of name components with anyNA():

FirstName <- "John"
MiddleInitial <- NA
LastName <- "Jones"
name <- c(FirstName, MiddleInitial, LastName)

x <- ifelse(anyNA(name), NA, paste(name, collapse=" "))
x
[1] NA

MiddleInitial <- "P"
name <- c(FirstName, MiddleInitial, LastName)
x <- ifelse(anyNA(name), NA, paste(name, collapse=" "))
x
[1] "John P Jones"

Demo

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360