-2

This might be solved one of two ways. I think I am using the aggregate() function incorrectly - although the results are close to what I want.

I am using:

WS_PART = aggregate(PART_WS_Pct$niin_id~PART_WS_Pct$ws_category,  PART_WS_Pct, FUN=function(x) unique(x))  

The results is almost what I desire, in a way. It contains a main category and then a list of all the parts under that category. Only, the second column of the data is an actual list.

I basically want to make a list for each ws_category containing all the parts.

Right now the data looks like this:

MY_CAT1, c("000245290", "000763050", "001218656", "001506526")
MY_CAT2, c("2343","2366")

I only have a few categories so I was thinking this could be good as a cross tab. The categories as headers and the PART #'s as rows with each column containing some kind of indicator like TRUE/FALSE or 0/1.

I'm open to more suggestions but those are the two I can think of. Worse case, I can convert the list to characters and do some manipulations that way?

Any advice?

DevGin
  • 443
  • 3
  • 12
  • 4
    Please add a reproducible example along with an expected output. – Ronak Shah May 24 '18 at 01:33
  • Potential duplicate of [Unlist data frame column preserving information from other column](https://stackoverflow.com/questions/26194298/unlist-data-frame-column-preserving-information-from-other-column). – Cristian E. Nuno May 24 '18 at 11:52
  • Please see [this](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa) and fix your question – David Arenburg May 24 '18 at 14:56

1 Answers1

1

Overview

I think what's happening is that there are variations in the number of parts within each ws_category. To remedy this, you'll have to transform your data from one row for each ws_cartegory, to one row for each ws_category and their corresponding parts.

To do this, humor me with a baseball reference. Lots of great players never play for teams that win a World Series while some seem to find themselves earning multiple rings over their caeers.

Here, df contains three rows, one for Ron Santo, Henry Blanco, and John Lester. Both Santo and Blanco never played for a team that won the World Series; however, Lester was part of two championship teams.

To expand df so that it has one row per baseball player and their corresponding World Series championship year(s), two solutions come to mind:

  1. : Use tidyr::unnest(); or

  2. Use both the base and utils packages to stack() the unlisted objects within the World Series column.

Code

# load necessary packages
library( tidyverse )

# make data
df <-
  data.frame( Name = c("Ron Santo", "Henry Blanco", "John Lester") )

# add WS Championship Years
df$WS_Champion <-
  list( NA, NA, c(2013, 2016) )

# view results
df
#           Name WS_Champion
# 1    Ron Santo          NA
# 2 Henry Blanco          NA
# 3  John Lester  2013, 2016

# base R solution

# name the objects within the list column
# with their corresponding `Name` value
names( df$WS_Champion ) <- df$Name

# unlist each object within the list column
# and stack the vectors into a data frame
df.stacked <-
  utils::stack( x = lapply( X = df$WS_Champion, FUN = unlist ) )

# rename the columns
colnames( df.stacked ) <- c("WS_Champion", "Name")

# view results
df.stacked
#   WS_Champion         Name
# 1          NA    Ron Santo
# 2          NA Henry Blanco
# 3        2013  John Lester
# 4        2016  John Lester

# tidyverse solution

# unnest df so that 'Name' repeats for every value in 'WS_Champion'
df <-
  unnest( data = df )

# view results
df
#           Name WS_Champion
# 1    Ron Santo          NA
# 2 Henry Blanco          NA
# 3  John Lester        2013
# 4  John Lester        2016

# end of script # 

Session Info

R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] forcats_0.3.0   stringr_1.3.0   dplyr_0.7.4     purrr_0.2.4    
[5] readr_1.1.1     tidyr_0.8.0     tibble_1.4.2    ggplot2_2.2.1  
[9] tidyverse_1.2.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16     cellranger_1.1.0 pillar_1.2.1    
 [4] compiler_3.4.4   plyr_1.8.4       bindr_0.1.1     
 [7] tools_3.4.4      lubridate_1.7.3  jsonlite_1.5    
[10] nlme_3.1-131.1   gtable_0.2.0     lattice_0.20-35 
[13] pkgconfig_2.0.1  rlang_0.2.0      psych_1.7.8     
[16] cli_1.0.0        rstudioapi_0.7   yaml_2.1.18     
[19] parallel_3.4.4   haven_1.1.1      bindrcpp_0.2    
[22] xml2_1.2.0       httr_1.3.1       hms_0.4.2       
[25] grid_3.4.4       glue_1.2.0       R6_2.2.2        
[28] readxl_1.0.0     foreign_0.8-69   modelr_0.1.1    
[31] reshape2_1.4.3   magrittr_1.5     scales_0.5.0    
[34] rvest_0.3.2      assertthat_0.2.0 mnormt_1.5-5    
[37] colorspace_1.3-2 stringi_1.1.7    lazyeval_0.2.1  
[40] munsell_0.4.3    broom_0.4.3      crayon_1.3.4 
Cristian E. Nuno
  • 2,822
  • 2
  • 19
  • 33
  • This looks like the perfect solution, unfortunately, I can't get tidyverse to install. Could be due to older version of R and RStudio. Any chance of another solution like this without tidyverse? – DevGin May 24 '18 at 02:26
  • @MarkGingrass there usually is a `base` R solution! See my updated answer. – Cristian E. Nuno May 24 '18 at 03:22