0

This is obviously an entry level question but just seems to indicate I might be missing some fundamentals of how R handles data transposition on a basic level...

Note, I have been reading all other messages/replies but I am still confused over why I couldn't perform this very simple task below .

Could someone at least explain to me what's happening here, also how to use those options in the warning message? (fairly straightforward to do in normal circumstances with 3 columns or more so something is missing here..)

My intention is to first create a data set of 2 columns with 100 records and 10 levels per fn variable, then transpose it into a wide dataset with 10 columns.

fn <- seq(1,10) 
val <- rnorm(100,0,1)

datout <- data.frame(fn,val)


##no duplicate
> which(duplicated(datout))
integer(0)

##pivot_wider gave me a listing with warning message:

datwd <- datout %>% 
  mutate(fn=paste0("type",fn)) %>%
  pivot_wider(names_from=fn, values_from=val)

##only getting a listing with warning message:

A tibble: 1 x 10

type1 type2 type3 type4 type5 type6 type7 type8 type9 type10

1 <dbl [10]> <dbl [10]> <dbl [10]> <dbl [10]> <dbl [10]> <dbl [10]> <dbl [10]> <dbl [10]> <dbl [10]> <dbl [10]> Warning message: Values are not uniquely identified; output will contain list-cols.

  • Use values_fn = list to suppress this warning.
  • Use values_fn = length to identify where the duplicates ari**se
  • Use values_fn = {summary_fun} to summarise duplicates
##spread() won't work at all

> datout %>% 
+  spread("fn", "val")
Error: Each row of output must be identified by a unique combination of keys.

dwstat7
  • 1
  • 1

1 Answers1

1

You almost got there. Your error message says: Values are not uniquely identified. If you look at the examples for pivot_wider, you will see that you need a unique identifier column to help guide the values into their correct position.

datout %>%
  
  dplyr::mutate(
    # Create an ID col
    ID = rep(1:10,times = 10) %>% sort,
    fn=paste0("type",fn)) %>% 
  
  tidyr::pivot_wider(
    # Use ID col here
    id_cols = ID,
    names_from=fn, values_from=val)

Output:

# A tibble: 10 x 11
      ID  type1   type2  type3  type4   type5  type6  type7   type8  type9 type10
   <int>  <dbl>   <dbl>  <dbl>  <dbl>   <dbl>  <dbl>  <dbl>   <dbl>  <dbl>  <dbl>
 1     1  0.805 -1.81   -0.235 -0.986  1.05   -2.11   0.951  1.24    0.401  0.517
 2     2 -1.33  -0.433   0.939  0.263  0.412   1.58   0.864  0.223   0.541  0.530
 3     3 -0.785  0.0519 -0.581  0.527  0.0339  0.440  0.980 -1.75   -0.346  2.49 
 4     4  0.279  0.261  -1.21  -0.232 -1.00   -0.455  1.22  -0.434   0.262 -2.36 
 5     5 -0.559  0.684  -1.56   0.747 -0.195  -3.22   0.383  0.735  -1.30   0.539
 6     6  0.618  0.544  -2.45   0.936  0.0550  1.49   1.83  -1.19    0.257  0.875
 7     7  0.308 -1.94   -0.440 -0.855 -0.712   0.475  0.418  0.718   1.60  -1.44 
 8     8  0.877  1.60    1.34   1.90   0.381  -0.440 -0.266 -0.0980  1.82   0.538
 9     9  1.38   0.199  -0.165 -1.09   0.350   0.966 -1.58  -0.888   0.551  0.493
10    10  0.143 -0.476   0.118 -1.16  -1.57   -1.73   0.479 -0.401  -0.932 -0.469
Anurag N. Sharma
  • 362
  • 2
  • 10
  • Okay, so basically I had to retain another column artificially in order for the code to work...it's more of a record # here. I guess one thing to note is that all those type categories need to have the same number of records for this type of transpose to work reasonably, so maybe there is some sense underneath this practice. Thanks. – dwstat7 Jun 21 '21 at 15:56
  • And this is a complaint to stackoverflow, part of my question was not answered, nor in any responses to other "similar" questions: also how to use those options in the warning message ... – dwstat7 Jun 21 '21 at 16:02
  • You're right. You do need to have the same number of records per category. – Anurag N. Sharma Jun 22 '21 at 11:13