0

I am quite new on R, and this post might be a duplicate of https://stackoverflow.com/questions/58837773/pivot-wider-issue-values-in-values-from-are-not-uniquely-identified-output-wpost, which however does not solve my doubts (I am not entirely sure it is exactly the same problem...)

My tibble looks like this:

tibs <- tibble(Brand = c("Brand1","Brand2","Brand1", "Brand2","Brand3","Brand4","Brand3","Brand4"),
              Category = c("Cat1", "Cat1", "Cat1", "Cat1","Cat2", "Cat2","Cat2", "Cat2"),
              share_1  = c(0.2, 0.8, 0.21, 0.79, 0.5, 0.5, NA, NA),
              share_2 = c(0.3, 0.7, 0.3, 0.7, NA, NA, 0.6, 0.4),
              share_3 = c(NA, NA, 0.21, 0.79, 0.6, 0.4,NA,NA),
              mktsize_1 = c(100, 100,200, 200, 100, 100, NA, NA),
              mktsize_2 = c(200,200,NA,NA,NA,NA,200,200),
              mktsize_3 = c(NA,NA,300,300,300,300,NA,NA),
              Type = c("Q", "Q", "P", "P", "Q", "Q", "P","P")
              )

And the output that I want is exactly the Bobby tibble below:

Bobby <- tibs %>%
         pivot_longer(cols = share_1:mktsize_3,
                      names_to = c(".value", "year"),
                      names_sep = "_") %>%
         pivot_wider(names_from = Type,
                     values_from = c(share, mktsize)) 

Problem: when I run a similar code (the same idea) on the real dataset, I get the following warning:


pivoted <- renamed %>%
           pivot_longer(cols = c(Share_2012:Share_2021, Unit_2012:Unit_2021),
                        names_to = c(".value", "Year"),
                        names_sep = "_"
                        )                                    %>%
           rename(Market_Size = Unit)                        %>%
           pivot_wider(names_from = Currency_Conversion,
                       values_from = c(Share, Market_Size)
                       )

Warning message:
Values from `Market_Size` and `Share` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list` to suppress this warning.
* Use `values_fn = {summary_fun}` to summarise duplicates.
* Use the following dplyr code to identify duplicates.
  {data} %>%
  dplyr::group_by(Geography, Industry, Category, Subcategory, NACE_mapping, Hierarchy_level, Data_Type, GBO_BI,
  Current_Constant, Measure, Year, Currency_Conversion) %>%
  dplyr::summarise(n = dplyr::n(), .groups = "drop") %>%
  dplyr::filter(n > 1L) 

Is it because I have several 100 values in the market shares? And, most importantly, is this problematic? I have some entries with null objects, which I could substitute with missing values as follows:

pivoted <- renamed %>%
           pivot_longer(cols = c(Share_2012:Share_2021, Unit_2012:Unit_2021),
                        names_to = c(".value", "Year"),
                        names_sep = "_"
                        )                                    %>%
           rename(Market_Size = Unit)                        %>%
           pivot_wider(names_from = Currency_Conversion,
                       values_from = c(Share, Market_Size),
                       values_fn = list)                     %>%
           select(-c(Share_NA, Market_Size_NA))              %>%
           mutate(across(where(is.list), map, `%||%`, NA))

However, I do not understand why the tibble Bobby does not give me this type of problem, whereas the real one does...

I tried to read through the error and check the post above, but I still a bit confused on what should I do (I am not even sure I should do something!)

Dharman
  • 30,962
  • 25
  • 85
  • 135
  • "In your code, you showed duplicate column names (c(Share_2012:Share_2021, Unit_2012:Unit_2021)) which is not supported in tibbles " What do you mean? – Matteo Bulgarelli Apr 13 '23 at 14:02
  • 4
    It will be more helpful to show an example that reproduces your problem. It takes more time to do but it's much more likely that you will have a fast answer if you provide one. If you can't reproduce the issue with fake data, maybe you could share a subset of your real data with `dput()`? – bretauv Apr 13 '23 at 14:36
  • 1
    I can't tell if you tried the diagnostic code that the warning message suggested. If so, what was the result? – Jon Spring Apr 13 '23 at 17:41
  • @bretauv I'll do this immediately, unfortunately I did not know of the existence of `dput()` – Matteo Bulgarelli Apr 14 '23 at 07:28
  • Sorry guys, I simplified too much, it's of course more complex than this. Thanks anyway for the answers, they were useful. – Matteo Bulgarelli Apr 14 '23 at 10:49
  • I note that you are pivoting columns "Unit..." to long, then trying to pivot "Marketshare..." back to wide. Note that in the working example "mktsize" is consistent. I think your pivot wider should include `values_from = c(Share, Unit)` instead of `values_from = c(Share, Market_Size)`. – Paul Stafford Allen Apr 14 '23 at 10:54
  • No unfortunately it's more complex than that, I have several cases with more than 1 Data Type for what I call "Q" or "P" in the `Bobby` tibble... I need first to make things a bit more consistent... – Matteo Bulgarelli Apr 14 '23 at 11:03
  • P.S. Note that `Unit` has been renamed `Market_Size` – Matteo Bulgarelli Apr 14 '23 at 11:09

0 Answers0