How to fix "Each row of output must be identified by a unique combination of keys" error in R

Question

I'm new to R. I have uBiome data (in csv) that I want to convert to Phyloseq. Been trying to use this R package called Actino, but whenever I use the actino::experiment_to_phyloseq() function, "Error: Each row of output must be identified by a unique combination of keys" shows up. Also says "Keys are shared for 2956 rows" along with a list of row pairs.

I have two files: the csv file (taxannotation.csv) and the mapfile (mapfile.csv). My csv file contains the columns ssr, tax_name, tax_rank, count, and percent.

The mapfile contains the ssrs on the first column similar to those in the csv file along with other attributes.

I use the code
taxannotation.ps<-experiment_to_phyloseq(taxannotation,mapfile)

While the ssrs in my csv file repeat in different rows, I believe that the other columns such as tax_name, tax_rank, count, and percent all give a different identity to each row.

Already tried searching for an answer, but never really found one that's informative or helpful.

You might try `dplyr::count(taxannotation, ssr, tax_name, tax_rank, count, percent, sort = TRUE)` to see if they really are unique keys. — Jon Spring, May 22 '19 at 03:37
I think you are using the wrong csv file. What does `taxannotation.csv` look like? — Amar, May 22 '19 at 04:48
@JonSpring I tried this, and it showed `# A tibble: 94,336 x 6 ssr tax_name tax_rank count percent n`. I'm assuming that the n tells me the number of observations of the combination of keys for that row. There are rows that give me n values of 2. — Marie Francisco, May 22 '19 at 09:08
@Amar The column names are ssr, tax_name, tax_rank, count, and percent. My ssr column just shows the sample IDs per row. — Marie Francisco, May 22 '19 at 09:14
Yes, this would explain your error -- the rows of `taxannotation` are not unique, but they need to be for the function to run. Are you missing another column with distinguishing information? Could you drop duplicates? — Jon Spring, May 22 '19 at 20:52
@MarieFranisco Either you are using the wrong file, or the `.csv` from uBiome has changed format. @JonSpring and I are attacking this from different sides. I think the package needs to be updated to handle the new long format of the data. Have a look at the github package, there's an example of the data. It's different from what you have described. Can you update your original post to include an example of what the data is? Make sure to format it correctly. — Amar, May 23 '19 at 01:20
@JonSpring That's all the columns there is from the data uBiome gave me. I'll try dropping duplicates. — Marie Francisco, May 23 '19 at 23:25
@Amar Yes, I think uBiome has changed its format recently. I have already edited my `.csv` file to match the headers in the github package. I'm afraid that the values in my ssr column are not the same from that in the sample data though. I have edited my original post, added a sample of my data. — Marie Francisco, May 23 '19 at 23:29
@JonSpring Tried dropping the duplicates. Still says there are keys shared for 2956 rows. I checked the list, compared it with my `.csv` file, but the rows are unique (i.e. different values for each column). — Marie Francisco, May 24 '19 at 00:10

How to fix "Each row of output must be identified by a unique combination of keys" error in R

0 Answers0