0

I would like to combine the product and price columns so that I copy the data from the country and date columns.

My data:

df1 = data.frame(country = c("USA", "USA", "ITALY"), date = c("10/19", "10/19" , "11/19"),
                 product=c("A","B","A"), price=c( 100, 200, 150), product2 =c("B","A", NA), 
                 price2 = c( 50, 100,NA),product3 =c("B","A", "C"), price3 = c(30,NA,10) )

> df1
  country  date product price product2 price2 product3 price3
1     USA 10/19       A   100        B     50        B     30
2     USA 10/19       B   200        A    100        A     NA
3   ITALY 11/19       A   150     <NA>     NA        C     10

I expected:

 country  date product price
1     USA 10/19       A   100
2     USA 10/19       B   200
3   ITALY 11/19       A   150
4     USA 10/19       B    50
5     USA 10/19       A   100
6     USA 10/19       B    30
7     USA 10/19       A    NA
8   ITALY 11/19       C    10
Leigh
  • 188
  • 2
  • 13
Zizou
  • 503
  • 5
  • 18

2 Answers2

3

You could use pivot_longer from tidyr.

library(dplyr)
library(tidyr)

pivot_longer(df1, 
              cols = -c(country, date), 
              names_to = c(".value", "num"), 
              names_sep = "\\d") %>%
 select(-num) %>%
 na.omit

# A tibble: 5 x 4
#  country date  product price
#  <fct>   <fct> <fct>   <dbl>
#1 USA     10/19 A         100
#2 USA     10/19 B          50
#3 USA     10/19 B         200
#4 USA     10/19 A         100
#5 ITALY   11/19 A         150
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • In my data there are many columns with products and prices. I'm trying to use your code and get a message: Warning message: Expected 2 pieces. Additional pieces discarded in 15 rows – Zizou Jan 27 '20 at 11:47
  • @Zizou Yes, that is because `product` column does not have a number ahead of it like `product2`, so it is safe to ignore the warning. Does it give you expected output though? – Ronak Shah Jan 27 '20 at 11:54
  • 1
    You can avoid the warning by being a little more verbose and changing `names_sep = "\\d"` to `names_pattern = '(\\w)($|\\d)'` – IceCreamToucan Jan 27 '20 at 12:07
  • I added next columns to my data. – Zizou Jan 27 '20 at 12:14
  • Ok but i don't want to skip all NA as seen in the example above – Zizou Jan 27 '20 at 12:17
  • Then returns NA for the product I don't want – Zizou Jan 27 '20 at 12:20
  • @Zizou Can you try `pivot_longer(df1, cols = -c(country, date), names_to = c(".value", "num"), names_pattern = '(product|price)($|\\d)')` ? – Ronak Shah Jan 27 '20 at 12:43
0

This question is related to yours: enter link description here

Here is a couple of solutions for your question:

Easy one:

bind_rows(
  df1 %>% select(country, date, product = product, price = price),
  df1 %>% select(country, date, product = product2, price = price2)
) %>%
  na.omit()

More scalable:

nm1 <- names(df1)[-(1:2)] 
split(nm1, sub("\\D+", "", nm1)) %>% 
  purrr::map_df(~ df1 %>% select(country, date, .x) %>%
           rename_at(3:4, ~ c("product", "price"))) %>%
  na.omit()

For your edited question:

library(dplyr)
library(purrr)
df1 = data.frame(country = c("USA", "USA", "ITALY"), date = c("10/19", "10/19" , "11/19"),
                 product=c("A","B","A"), price=c( 100, 200, 150), product2 =c("B","A", NA), 
                 price2 = c( 50, 100,NA),product3 =c("B","A", "C"), price3 = c(30,NA,10), stringsAsFactors = FALSE)

nm1 <- names(df1)[-(1:2)] 
split(nm1, sub("\\D+", "", nm1)) %>% 
  purrr::map_df(~ df1 %>% select(country, date, .x) %>%
                  rename_at(3:4, ~ c("product", "price"))) %>%
  filter(!is.na(product))

Results:

     country  date product price
1     USA 10/19       A   100
2     USA 10/19       B   200
3   ITALY 11/19       A   150
4     USA 10/19       B    50
5     USA 10/19       A   100
6     USA 10/19       B    30
7     USA 10/19       A    NA
8   ITALY 11/19       C    10
Anderson Arroyo
  • 327
  • 2
  • 13