-1

TLDR: I have a problem with tidyr::pivot_wider() and suspect that it can be a bug of tidyr. I need alternative code with tidyr:: spread() of my code below to find out if it is a bug of tidyr or there is other problem. Or if you have an idea what can be wrong please provide your solution with tidyr::pivot_wider()


I cannot provide data for my code. My data contains 1694 rows, 11 variables, and missing values.

Here is my code:

temp_data_tibb <- temp_data_tibb %>% 
  pivot_wider(names_from = Month, id_cols = ID, values_from = c("var1", "var2", "var3", "var4", "var5", "var6", "var7", "var8", "var9"))

The problem is that if I subset part of my data than my code works fine. With full data it gave me errors like:

Error: Invalid type returned by `vec_proxy_compare()`.
Call `rlang::last_error()` to see a backtrace.
In addition: Warning messages:
1: Values in `var1` are not uniquely identified; output will contain list-cols.
* Use `values_fn = list(var1 = list)` to suppress this warning.
* Use `values_fn = list(var1 = length)` to identify where the duplicates arise
* Use `values_fn = list(var1 = summary_fun)` to summarise duplicates 

and last 5 lines of error repeats for every variable from values_from =. I tried to locate problematic row by gradually subseting and testing but cannot locate it because problem is not in a particular row.

Please provide alternative solution with tidyr:: spread() or provide other solution if you have one.

vasili111
  • 6,032
  • 10
  • 50
  • 80
  • Possible duplicate of [Issue with pivot\_longer and pivot\_wider](https://stackoverflow.com/questions/58331771/issue-with-pivot-longer-and-pivot-wider) or [tidyrpivot-wider-replace-values-with-data-type](https://stackoverflow.com/questions/58106809/tidyrpivot-wider-replace-values-with-data-type) – phiver Oct 26 '19 at 07:16
  • @phiver See my answer. It is not duplicate. – vasili111 Oct 26 '19 at 16:59
  • Actually it is. The dubs are because of non unique row identifiers when spreading the data, just like your answer shows. Most solutions add a row number or something like that to fix it. – phiver Oct 26 '19 at 17:17
  • Even if you can't provide this exact data, there are plenty of ways you can still make a reproducible example—make a small subset of data that you *can* post, make dummy data that replicates the issue, find a commonly-available dataset that is similar enough. Right now we can't see what you're working with and what's going on. If you think it's a bug with `tidyr`, the package authors would need to see a working example to debug as well. – camille Oct 28 '19 at 00:23
  • @camille It is solved, please see my answer (cannot accept because of timer). In that case, I cannot post subset of data. Generation of data that can reproduce that issue I think is very hard to guess if you do not know that the problem actually was "several rows with same `ID` and `Month`". If you do not know cause you cannot make similar fake data. – vasili111 Oct 28 '19 at 00:50
  • So how could we have possibly helped debug the code, and how would this help other users? – camille Oct 28 '19 at 01:07
  • @camille I agree with you that it is very hard to help without reproducible example. If you look at my other questions you will see that I nearly always provide reproducible example. But as I already told it is very hard to guess how to create reproducible example without knowing what causes problem in this case. Even so it is hard but possible to help even in this case. Here people where able to help: https://www.reddit.com/r/Rlanguage/comments/dna39g/alternative_code_with_tidyr_spread_of_my_code/f5ajtjs/?context=10000 – vasili111 Oct 28 '19 at 01:13
  • and https://www.reddit.com/r/rstats/comments/dna34c/alternative_code_with_tidyr_spread_of_my_code/f5ajt15/?context=10000 . Other people can google error message and find my question and benefit from it. – vasili111 Oct 28 '19 at 01:14

1 Answers1

0

There where several rows with same ID and Month.

I used this code for finding problem:

library(tidyverse)

# Load data from csv file.
orig_data <- read.csv(file="D:/Arch/data.csv", header=TRUE, sep=",")
temp_data <- orig_data


# Subset 3 months.
temp_data_first3M <- temp_data[temp_data$Month == "M1" | temp_data$Month == "M2" | temp_data$Month == "M3",]


# Replace "" with NA.
temp_data_first3M[temp_data_first3M == ""] <- NA

# Get frequency of all IDs (count of every similar value in ID column).
table_results <- as.data.frame(table(temp_data_first3M$ID))



names(table_results) <- c("ID", "Freq")

# Subset the rows that have Freq > 3
table_results_more_than_three <- table_results[table_results$Freq > 3,]

# View results.
View(table_results_more_than_three)
Nimantha
  • 6,405
  • 6
  • 28
  • 69
vasili111
  • 6,032
  • 10
  • 50
  • 80