0

I have a data set that I've subset which looks like this:

Item Code Percentage
10000 123 0.2
10001 134 0.98
10001 152 0.02
10002 123 0.68
10003 123 1
10002 178 0.32
10004 189 1

I want to find a way to transpose in a way where I only retain unique values from column A, Column B is dispersed into different columns according to unique values and the Percentage populates in those values. Please see the example of the data I'm looking to finalize with:

Item 123 134 152 178 189
10000 0.2 0 0 0 0
10001 0 0.98 0.02 0 0
10002 0.68 0 0 0.3 0
10003 1 0 0 0 0
10004 0 0 0 0 1

I am currently using a format the follows this "skeleton":

       df <-df %>%
       group_by(Item) %>%
       mutate(n = row_number()) %>%
       spread(Code, Percentage)

Following this structure I still get values in Column A repeating (not unique). I did load library(plyr) library(dplyr) library(tidyr) in that order. Reason why I mentioned is I read somewhere if you switch the order it works, but ended up messing up the results.

If you need more information please let me know. Thank you!

guirgo__
  • 3
  • 2
  • plyr has been retired for a few years now. You can do this with tidyr, which also has updated syntax from `spread` to `pivot_wider`. There are many SO questions on this same task; I'll dig up a few to flag – camille Feb 25 '21 at 17:11
  • Does this answer your question? [Why does pivot\_wider either read single values as duplicates or create a wide-and-long tibble (without merging rows)?](https://stackoverflow.com/questions/64627034/why-does-pivot-wider-either-read-single-values-as-duplicates-or-create-a-wide-an) – camille Feb 25 '21 at 17:39
  • Also see https://stackoverflow.com/q/64169332/5325862 – camille Feb 25 '21 at 17:48
  • Hey All! thank you so much for your help! Both options worked out for me easily! Really appreciate it! :) – guirgo__ Feb 25 '21 at 20:28

2 Answers2

1

using tidyverse

library(tidyverse)

df <- read.table(text = "Item   Code    Percentage
10000   123 0.2
10001   134 0.98
10001   152 0.02
10002   123 0.68
10003   123 1
10002   178 0.32
10004   189 1", header = T)

pivot_wider(
  data = df, 
  id_cols = Item, 
  names_from = Code, 
  values_from = Percentage, 
  values_fill = 0
)
#> # A tibble: 5 x 6
#>    Item `123` `134` `152` `178` `189`
#>   <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 10000  0.2   0     0     0        0
#> 2 10001  0     0.98  0.02  0        0
#> 3 10002  0.68  0     0     0.32     0
#> 4 10003  1     0     0     0        0
#> 5 10004  0     0     0     0        1

Created on 2021-02-25 by the reprex package (v1.0.0)

using data.table

library(data.table)
setDT(df)
dcast(data = df, formula = Item ~ Code, value.var = "Percentage", fill = 0)
#>     Item  123  134  152  178 189
#> 1: 10000 0.20 0.00 0.00 0.00   0
#> 2: 10001 0.00 0.98 0.02 0.00   0
#> 3: 10002 0.68 0.00 0.00 0.32   0
#> 4: 10003 1.00 0.00 0.00 0.00   0
#> 5: 10004 0.00 0.00 0.00 0.00   1

Created on 2021-02-25 by the reprex package (v1.0.0)

Yuriy Saraykin
  • 8,390
  • 1
  • 7
  • 14
1

The issue is that it needs to do both grouping by 'Code' and 'Item'

library(dplyr)
library(tidyr)
df %>%
   group_by(Code, Item) %>%
   mutate(n = row_number()) %>%
   ungroup %>%
   spread(Code, Percentage, fill = 0) %>%
   select(-n)

-output

# A tibble: 5 x 6
#   Item `123` `134` `152` `178` `189`
#  <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 10000  0.2   0     0     0        0
#2 10001  0     0.98  0.02  0        0
#3 10002  0.68  0     0     0.32     0
#4 10003  1     0     0     0        0
#5 10004  0     0     0     0        1
akrun
  • 874,273
  • 37
  • 540
  • 662