Create col Z with values from col X or Y, after removing unwanted spaces

Question

I have col X and col Y.

Col X has values with organisation name and col Y has values with the ultimate parent name of the corresponding organisation in col X. The values of Col Y are more important for me, but it has many missing values.

I would like to create a col Z, which would contain the values of col Y; but if there are missing values in col Y, then I would like col Z to contain the values of col X.

Edit: I have realised that my columns X and Y have unwanted spaces, which I would like to get rid of, before I create column Z.

Apparently, cols X and Y have cells with spaces only (e.g. 20 spaces). I would like to make these cells blank, so I could use the code provided by akrun below. Also, some of the useful text values (names of an organisation) end with two or more spaces, and I would like to get rid of these unwanted spaces as well.

An example of what I have and want to have, where "s" represents a space:

Apologies for posting a pic as an example (still can't figure out how to make a more comfy data table).

Thanks...

With respect to my question about unwanted spaces, aromatic6tet referred me to a useful source (https://stackoverflow.com/questions/5992082/how-to-remove-all-whitespace-from-a-string), where I found that I can get rid of unwanted spaces by using the function str_squish. Thanks for your help, this thread can be closed. — Ken Lee, Apr 08 '21 at 18:37

akrun · Accepted Answer · 2021-04-08T17:00:58.193

2

If the column elements have NA, then we can use coalesce

library(dplyr)
df1 %>%
    mutate(Z = coalesce(Y, X))

If those blanks are "", then replace with NA and then use coalecse

df1 %>%
  mutate(Z = coalecse(na_if(Y, ""), na_if(X, "")))

edited Apr 08 '21 at 17:00

answered Apr 08 '21 at 17:00

akrun

874,273
37
540
662

2

Shouldn't it be coalesce(Y,X) ? – Karthik S Apr 08 '21 at 17:00
Thanks, this works as desired. However, I have realised that my column Y has trickier values than mere blanks. Apparently, col Y has cells with spaces only (e.g. 20 spaces). I would like to make these cells blank, so I could use your code. Also, some of the useful text values (names of an organisation) end with two or more spaces, and I would like to get rid of these unwanted spaces. Maybe you know how to achieve this? This is what I have and desire to have, where "s" represents a space: https://i.stack.imgur.com/jLMe0.png Many thanks! – Ken Lee Apr 08 '21 at 17:52
look at `?gsub` or [this SO post](https://stackoverflow.com/questions/5992082/how-to-remove-all-whitespace-from-a-string) @KenLee – aromatic6tet Apr 08 '21 at 18:13
Many thanks, str_squish function did the job! – Ken Lee Apr 08 '21 at 18:35

score 1 · Answer 2 · edited Apr 08 '21 at 17:48

I've accomplished this in the past using a combination of dplyr::mutate with ifelse. It gets tricky if there's going to be more than two options. In your case since the value to fill the space should come from either X or Y, this should work for you.


X <- list("", "Sand", "Paper", "", "Water", "Air Co Ltd")
Y <- list("", "", "", "Fire Corp", "Water Corp", "Air Corp")

df <- data.frame(cbind(X, Y))


df <- df %>% 
  dplyr::mutate(Z = ifelse(X=="", Y, X))

And if the blanks are NA then use:

X <- list(NA, "Sand", "Paper", NA, "Water", "Air Co Ltd")
Y <- list(NA, NA, NA, "Fire Corp", "Water Corp", "Air Corp")

df <- data.frame(cbind(X, Y))


df <- df %>% 
  dplyr::mutate(Z = ifelse(is.na(X), Y, X))

If you need to stick with base R, then use this:

# if the blanks are blank
df$Z <- ifelse(X=="", Y, X)

# if the blanks are NA
df$Z <- ifelse(is.na(X), Y, X)

Create col Z with values from col X or Y, after removing unwanted spaces

2 Answers2