-2

photo of current data

The data shows NA for some points but the information is right below it. It is the same UPC, Store, and Week. How do I group my data to avoid redundancy and the NA data?

This is my code so far:

`library(tidyverse)
RD <- read.csv("Raw Soft Drinks Sales Data.csv")
U  <- read.csv("UPC Soft Drinks.csv") %>%
  mutate(UPC   = as.factor(UPC),
         BRAND = as.factor(BRAND),
         CLASS = as.factor(CLASS))
RDX <- RD %>%
  filter(UPC != "Total") %>%
  select (-c(Total.Q1,Total.Q2,Total.Q3,Total.Q4))


RDXL <- RDX %>%
  pivot_longer(
    cols = starts_with("Week"),
#    cols = X1:X52,
#    cols = !c("STORE","UPC"),
    names_to = "WEEK", 
    names_prefix = "Week",
    values_to = "UNITS",
    values_drop_na = TRUE)

RDW <- pivot_wider(RDXL, names_from = "ITEM", values_from = "UNITS")%>%
  select(-TOTAL)

`

This is what the original data set looks like: original data

I need Store, UPC, Dollars, Units, Feat, Deal, and Week to be their own columns.

  • Please provide a [reproducible minimal example](https://stackoverflow.com/q/5963269/8107362). Especially, provide some sample and target data in a clear and ready-to-use format, e.g. with `dput()` and use the [reprex-package](https://reprex.tidyverse.org/). – mnist Sep 19 '21 at 21:45
  • Looks like a fairly simple merge/join process: `merge(df[1:2] , df[c(1,3)])` as a start. `merge` is base R. Look up the help pages and examples here on the various `*_join` functions in the tidyverse. – IRTFM Sep 19 '21 at 22:23
  • 3
    Note the instructions at the top of the [tag:r] tag page and in particular the one NOT to use images because no one else can easily use them without retyping it all. – G. Grothendieck Sep 19 '21 at 22:53
  • In addition to G. Grothendieck's comment, here is why: https://meta.stackoverflow.com/a/285557/11374827 – teunbrand Sep 22 '21 at 14:08

2 Answers2

0

I think you may want to get all your data in the wide format. If your current data has one value for every data column (DOLLARS:DEAL) for every week, and all other data columns are NAs, you may want to coalesce the data like this:

minimal example data

tibble(store=c(1,1,1), UPC=rep(1200000044, 3), week=c(1, 1, 1), DOLLARS=c(10.1, NA, NA), UNITS=c(NA, 30, NA), FEAT=c(NA, NA, 'unknown'))

# A tibble: 3 x 6
  store        UPC  week DOLLARS UNITS FEAT   
  <dbl>      <dbl> <dbl>   <dbl> <dbl> <chr>  
1     1 1200000044     1    10.1    NA NA     
2     1 1200000044     1    NA      30 NA     
3     1 1200000044     1    NA      NA unknown

A solution

You can group_by the ID columns (UPC, week), and then sort with na.last=TRUE :

library(dplyr)

 df %>% group_by(UPC, week) %>%
         mutate(across(DOLLARS:FEAT, sort, na.last = TRUE)) %>%
         filter(!if_all(DOLLARS:FEAT, is.na))

# A tibble: 1 x 6
# Groups:   UPC, week [1]
  store        UPC  week DOLLARS UNITS FEAT   
  <dbl>      <dbl> <dbl>   <dbl> <dbl> <chr>  
1     1 1200000044     1    10.1    30 unknown

Minimal reprex for the original data

df2<-tibble(store=c(1, 1), UPC=rep(1200000044, 2), ITEM=c('DOLLARS', 'UNITS'), Week.1=c(58.4, 29), Week.2=c(118.8, 55))

df2

# A tibble: 2 x 5
  store        UPC ITEM    Week.1 Week.2
  <dbl>      <dbl> <chr>    <dbl>  <dbl>
1     1 1200000044 DOLLARS   58.4   119.
2     1 1200000044 UNITS     29      55 

solution with original data

Starting with the original data, you can sequentially pivot_longer %>% pivot_wider

df2 %>% pivot_longer(cols = starts_with('Week'), names_to = 'week', values_to = 'value')%>%
        pivot_wider(names_from = ITEM)

# A tibble: 2 x 5
  store        UPC week   DOLLARS UNITS
  <dbl>      <dbl> <chr>    <dbl> <dbl>
1     1 1200000044 Week.1    58.4    29
2     1 1200000044 Week.2   119.     55
GuedesBF
  • 8,409
  • 5
  • 19
  • 37
0

We could also do

library(dplyr)
df %>%
    group_by(UPC, week) %>%
    mutate(across(DOLLARS:FEAT, ~ .[order(is.na(.))])) %>%
    filter(if_all(DOLLARS:FEAT, Negate(is.na)))
akrun
  • 874,273
  • 37
  • 540
  • 662