How to combine columns based on other columns in r

Question

I would like to combine the product and price columns so that I copy the data from the country and date columns.

My data:

df1 = data.frame(country = c("USA", "USA", "ITALY"), date = c("10/19", "10/19" , "11/19"),
                 product=c("A","B","A"), price=c( 100, 200, 150), product2 =c("B","A", NA), 
                 price2 = c( 50, 100,NA),product3 =c("B","A", "C"), price3 = c(30,NA,10) )

> df1
  country  date product price product2 price2 product3 price3
1     USA 10/19       A   100        B     50        B     30
2     USA 10/19       B   200        A    100        A     NA
3   ITALY 11/19       A   150     <NA>     NA        C     10

I expected:

 country  date product price
1     USA 10/19       A   100
2     USA 10/19       B   200
3   ITALY 11/19       A   150
4     USA 10/19       B    50
5     USA 10/19       A   100
6     USA 10/19       B    30
7     USA 10/19       A    NA
8   ITALY 11/19       C    10

score 3 · Answer 1 · answered Jan 27 '20 at 11:26

3

You could use pivot_longer from tidyr.

library(dplyr)
library(tidyr)

pivot_longer(df1, 
              cols = -c(country, date), 
              names_to = c(".value", "num"), 
              names_sep = "\\d") %>%
 select(-num) %>%
 na.omit

# A tibble: 5 x 4
#  country date  product price
#  <fct>   <fct> <fct>   <dbl>
#1 USA     10/19 A         100
#2 USA     10/19 B          50
#3 USA     10/19 B         200
#4 USA     10/19 A         100
#5 ITALY   11/19 A         150

answered Jan 27 '20 at 11:26

Ronak Shah

377,200
20
156
213

In my data there are many columns with products and prices. I'm trying to use your code and get a message: Warning message: Expected 2 pieces. Additional pieces discarded in 15 rows – Zizou Jan 27 '20 at 11:47
@Zizou Yes, that is because `product` column does not have a number ahead of it like `product2`, so it is safe to ignore the warning. Does it give you expected output though? – Ronak Shah Jan 27 '20 at 11:54
1

You can avoid the warning by being a little more verbose and changing `names_sep = "\\d"` to `names_pattern = '(\\w)($|\\d)'` – IceCreamToucan Jan 27 '20 at 12:07
I added next columns to my data. – Zizou Jan 27 '20 at 12:14
Ok but i don't want to skip all NA as seen in the example above – Zizou Jan 27 '20 at 12:17
Then returns NA for the product I don't want – Zizou Jan 27 '20 at 12:20
@Zizou Can you try `pivot_longer(df1, cols = -c(country, date), names_to = c(".value", "num"), names_pattern = '(product|price)($|\\d)')` ? – Ronak Shah Jan 27 '20 at 12:43

Anderson Arroyo · Answer 2 · 2020-01-27T14:56:20.903

This question is related to yours: enter link description here

Here is a couple of solutions for your question:

Easy one:

bind_rows(
  df1 %>% select(country, date, product = product, price = price),
  df1 %>% select(country, date, product = product2, price = price2)
) %>%
  na.omit()

More scalable:

nm1 <- names(df1)[-(1:2)] 
split(nm1, sub("\\D+", "", nm1)) %>% 
  purrr::map_df(~ df1 %>% select(country, date, .x) %>%
           rename_at(3:4, ~ c("product", "price"))) %>%
  na.omit()

For your edited question:

library(dplyr)
library(purrr)
df1 = data.frame(country = c("USA", "USA", "ITALY"), date = c("10/19", "10/19" , "11/19"),
                 product=c("A","B","A"), price=c( 100, 200, 150), product2 =c("B","A", NA), 
                 price2 = c( 50, 100,NA),product3 =c("B","A", "C"), price3 = c(30,NA,10), stringsAsFactors = FALSE)

nm1 <- names(df1)[-(1:2)] 
split(nm1, sub("\\D+", "", nm1)) %>% 
  purrr::map_df(~ df1 %>% select(country, date, .x) %>%
                  rename_at(3:4, ~ c("product", "price"))) %>%
  filter(!is.na(product))

Results:

     country  date product price
1     USA 10/19       A   100
2     USA 10/19       B   200
3   ITALY 11/19       A   150
4     USA 10/19       B    50
5     USA 10/19       A   100
6     USA 10/19       B    30
7     USA 10/19       A    NA
8   ITALY 11/19       C    10

I have factor and num type and I get error: Only strings can be converted to symbols. — Zizou, Jan 27 '20 at 14:24
When you create the dataframe, use the option stringsAsFactors = FALSE. — Anderson Arroyo, Jan 27 '20 at 14:39
I updtaded the answer, but it should work. I executed it and the result is what you are looking for. Try to execute line per line. — Anderson Arroyo, Jan 27 '20 at 14:57

How to combine columns based on other columns in r

2 Answers2