Combine two or more columns in a dataframe into a new column with a new name

Question

For example if I have this:

n = c(2, 3, 5) 
s = c("aa", "bb", "cc") 
b = c(TRUE, FALSE, TRUE) 
df = data.frame(n, s, b)

  n  s     b
1 2 aa  TRUE
2 3 bb FALSE
3 5 cc  TRUE

Then how do I combine the two columns n and s into a new column named x such that it looks like this:

  n  s     b     x
1 2 aa  TRUE  2 aa
2 3 bb FALSE  3 bb
3 5 cc  TRUE  5 cc

score 170 · Answer 1 · edited Aug 07 '13 at 23:46

170

Use paste.

 df$x <- paste(df$n,df$s)
 df
#   n  s     b    x
# 1 2 aa  TRUE 2 aa
# 2 3 bb FALSE 3 bb
# 3 5 cc  TRUE 5 cc

edited Aug 07 '13 at 23:46

thelatemail

91,185
12
128
188

answered Aug 07 '13 at 23:40

mnel

113,303
27
265
254

.@thelatemail - How to add a special character between data points using `paste()`? For above example, `x` column should have data as `2-aa`, then `3-bb` and `5-cc`. – Chetan Arvind Patil Oct 06 '17 at 02:28
10

.@thelatemail - This worked for me: `paste(df$n,df$s,sep="-")` – Chetan Arvind Patil Oct 06 '17 at 19:09
4

how can you omit NA if column `s` has NA value? (I don't like to see `3 NA` if `df$s[2]=NA`) – Cina Aug 10 '18 at 23:18

score 53 · Answer 2 · edited Mar 27 '19 at 07:31

53

For inserting a separator:

df$x <- paste(df$n, "-", df$s)

edited Mar 27 '19 at 07:31

zx8754

52,746
12
114
209

answered Feb 27 '17 at 21:10

Little Bee

1,175
2
13
22

1

.@LittleBee - This adds a space between two data. Final output for example is like: `A - B` instead of `A-B`. Is it possible to remove this extra space? – Chetan Arvind Patil Oct 06 '17 at 02:35
9

.@LittleBee - This worked for me: `paste(df$n,df$s,sep="-")` – Chetan Arvind Patil Oct 06 '17 at 19:09
5

use paste0 instead of paste – Ferroao Nov 02 '17 at 16:55
3

This won't give the desired output : OP asks for a space in between the elements, not another separator (which, by the way, would be better put as the `sep` argument...). The other answer, posted almost 4 years prior to yours, is however perfectly answering the question. – Cath Mar 27 '19 at 09:12

score 27 · Answer 3 · answered Apr 16 '18 at 14:58

27

As already mentioned in comments by Uwe and UseR, a general solution in the tidyverse format would be to use the command unite:

library(tidyverse)

n = c(2, 3, 5) 
s = c("aa", "bb", "cc") 
b = c(TRUE, FALSE, TRUE) 

df = data.frame(n, s, b) %>% 
  unite(x, c(n, s), sep = " ", remove = FALSE)

answered Apr 16 '18 at 14:58

Quentin Perrier

476
4
5

2

What is x in this example? – Levi Apr 03 '19 at 13:51
1

@Levi, that ``x`` represents the name of the new column that contains the combined values. Think of dplyr's ``mutate``: ``df %>% dplyr::mutate(x = "your operations")`` – Vesanen Aug 13 '20 at 18:33
Could you please explain why mutate is incorrect but unite is correct? I think this had been explained in comments by Uwe and UseR, but I can't find seem to find those comments--I think they were deleted. Thank you! – jdcode Feb 13 '22 at 15:49

score 20 · Answer 4 · answered Mar 10 '18 at 17:17

20

Using dplyr::mutate:

library(dplyr)
df <- mutate(df, x = paste(n, s)) 

df 
> df
  n  s     b    x
1 2 aa  TRUE 2 aa
2 3 bb FALSE 3 bb
3 5 cc  TRUE 5 cc

answered Mar 10 '18 at 17:17

sbha

9,802
2
74
62

2

No, as already existing answers, you are using *paste*, not *mutate*. – zx8754 Mar 27 '19 at 07:32
I thought I was demonstrating how columns could be combined as a part of a `dplyr::mutate()`. Sorry, just trying to be helpful - I won't pollute the site anymore and abstain from future postings. – sbha Mar 27 '19 at 22:44
Sorry, if it came out as rude. OP's problem is not solved by using *mutate*, question is not about how to use *dplyr*, but how to combine column values. I am simply pointing out that they need *paste* not *mutate*. If we want to demonstrate *dplyr* correct way is [using the function *unite*](https://stackoverflow.com/a/49860448/680068). – zx8754 Mar 28 '19 at 06:37
@zx8754, why mutate is incorrect but unite is correct? The answer you shared has referenced comments by Uwe and UseR, but it looks like those comments have been deleted. – jdcode Feb 13 '22 at 15:48

Ferroao · Answer 5 · 2019-03-11T20:32:19.620

16

Some examples with NAs and their removal using apply

n = c(2, NA, NA) 
s = c("aa", "bb", NA) 
b = c(TRUE, FALSE, NA) 
c = c(2, 3, 5) 
d = c("aa", NA, "cc") 
e = c(TRUE, NA, TRUE) 
df = data.frame(n, s, b, c, d, e)

paste_noNA <- function(x,sep=", ") {
gsub(", " ,sep, toString(x[!is.na(x) & x!="" & x!="NA"] ) ) }

sep=" "
df$x <- apply( df[ , c(1:6) ] , 1 , paste_noNA , sep=sep)
df

edited Mar 11 '19 at 20:32

answered Dec 06 '16 at 11:58

Ferroao

3,042
28
53

@Ferroao Thanks, you saved my life. pls move paste_noNA function before df$x <-apply. – malajisi Mar 11 '19 at 16:57

score 14 · Answer 6 · edited Mar 27 '19 at 07:33

14

We can use paste0:

df$combField <- paste0(df$x, df$y)

If you do not want any padding space introduced in the concatenated field. This is more useful if you are planning to use the combined field as a unique id that represents combinations of two fields.

edited Mar 27 '19 at 07:33

zx8754

52,746
12
114
209

answered Apr 08 '17 at 00:25

yanes

440
6
11

avallecam · Answer 7 · 2020-04-10T13:48:35.010

Instead of

paste (default spaces),
paste0 (force the inclusion of missing NA as character) or
unite (constrained to 2 columns and 1 separator),

I'd suggest an alternative as flexible as paste0 but more careful with NA: stringr::str_c

library(tidyverse)

# check the missing value!!
df <- tibble(
  n = c(2, 2, 8),
  s = c("aa", "aa", NA_character_),
  b = c(TRUE, FALSE, TRUE)
)

df %>% 
  mutate(
    paste = paste(n,"-",s,".",b),
    paste0 = paste0(n,"-",s,".",b),
    str_c = str_c(n,"-",s,".",b)
  ) %>% 

  # convert missing value to ""
  mutate(
    s_2=str_replace_na(s,replacement = "")
  ) %>% 
  mutate(
    str_c_2 = str_c(n,"-",s_2,".",b)
  )
#> # A tibble: 3 x 8
#>       n s     b     paste          paste0     str_c      s_2   str_c_2   
#>   <dbl> <chr> <lgl> <chr>          <chr>      <chr>      <chr> <chr>     
#> 1     2 aa    TRUE  2 - aa . TRUE  2-aa.TRUE  2-aa.TRUE  "aa"  2-aa.TRUE 
#> 2     2 aa    FALSE 2 - aa . FALSE 2-aa.FALSE 2-aa.FALSE "aa"  2-aa.FALSE
#> 3     8 <NA>  TRUE  8 - NA . TRUE  8-NA.TRUE  <NA>       ""    8-.TRUE

^{Created on 2020-04-10 by the reprex package (v0.3.0)}

extra note from str_c documentation

Like most other R functions, missing values are "infectious": whenever a missing value is combined with another string the result will always be missing. Use str_replace_na() to convert NA to "NA"

`paste0(n,"-",s,".",b)` and `str_c(n,"-",s,".",b)` are exactly the same, both use a default separator that is the empty string `''`. I also don't know why `paste` is "tidy", you mean you don't like spaces? — Axeman, Mar 27 '19 at 19:46
`paste0` and `str_c` are not exactly the same. take a look to these links: (1) https://www.rdocumentation.org/packages/stringr/versions/1.3.1/topics/str_c (2) https://stackoverflow.com/questions/53118271/difference-between-paste-str-c-str-join-stri-join-stri-c-stri-pa/53118273#53118273 — avallecam, Jan 20 '20 at 20:43
Ah I see! Thanks! How they are different would be a good addition to this answer (and the `str_c` documentation could be more explitic too!). — Axeman, Jan 20 '20 at 21:11
@Axeman thanks for your suggestion. I've simplified the answer plus added an extra note on the issue — avallecam, Apr 10 '20 at 13:39

score 5 · Answer 8 · answered Apr 15 '20 at 03:28

There are other great answers, but in the case where you don't know the column names or the number of columns you want to concatenate beforehand, the following is useful.

df = data.frame(x = letters[1:5], y = letters[6:10], z = letters[11:15])
colNames = colnames(df) # could be any number of column names here
df$newColumn = apply(df[, colNames, drop = F], MARGIN = 1, FUN = function(i) paste(i, collapse = ""))

score 1 · Answer 9 · answered Jul 10 '22 at 08:26

I'd like to also propose a method for concatenating a large/unknown number of columns. The solution proposed by Ben Ernest can be pretty slow on large datasets.

Below is my proposed solution:

# setup data.frame - Making it large for the time benchmarking
n = rep(c(2, 3, 5), 1000000)
s = rep(c("aa", "bb", "cc"), 1000000)
b = rep(c(TRUE, FALSE, TRUE), 1000000) 
df = data.frame(n, s, b)

# The proposed solution:
colNames = c("n", "s") # could be any number of column names here
df$x <- do.call(paste0, c(df[,colNames], sep=" "))

# running system.time on this yields:
# user  system elapsed 
# 1.861   0.005   1.865 

# compare with alternative method:
df$x <- apply(df[, colNames, drop = F], MARGIN = 1, 
                         FUN = function(i) paste(i, collapse = ""))
# running system.time on this yields:
# user  system elapsed 
#  16.127   0.147  16.304

Combine two or more columns in a dataframe into a new column with a new name

9 Answers9

Linked

Related