0

I am trying to adjust my data into a format that can be used in the R package BAT. My data is currently formatted as:

Cover, ScientificName, PinTag 
3, Elymus.virginicus, AA16
4, EUONYMUS.FORTUNEI, AA16
5, GLECHOMA.HEDERACEA, AA16
3, EUONYMUS.FORTUNEI, AA17
2, GLECHOMA.HEDERACEA, AA17
2, Acer.negundo, AA19
4, Elymus.virginicus, AA19

I am trying to reformat my data into the following:

PinTag, Acer.negundo, Elymus.virginicus, EUONYMUS.FORTUNEI, GLECHOMA.HEDERACEA
AA16, 0, 3, 4, 5
AA17, 0, 0, 3, 2
AA19, 2, 4, 0, 0

I tried the following based on an earlier post:

library(reshape2)
MyData <- dcast(QuadratCoverBAT, PinTag~ScientificName, value.var="Cover")

Which gave the error message "Aggregation function missing: defaulting to length" and provided an output on my full dataset (1821 obs) of:

PinTag, Acer.negundo, Elymus.virginicus, EUONYMUS.FORTUNEI, GLECHOMA.HEDERACEA
AA16, 0, 1, 1, 1
AA17, 0, 0, 1, 1
AA19, 1, 1, 0, 0    

There should be no duplicated data in the dataset, but R tells me I have 208 duplicates when I run

anyDuplicated(QuadratCoverBAT)

If the results were 1 or 2, I could believe I had made a mistake and missed a duplicate in the data, but there is no way I've missed over 200...

Any help would be greatly appreciated!

EDIT: Here is a link to the full data file - [deleted link]

Danelle
  • 9
  • 2
  • I'm getting the correct result. Please provide an example that reproduces the error. – Jaap Oct 21 '15 at 05:18
  • I added a link to the dataset above. I found one additional duplicate which I corrected , but I still get the same error. – Danelle Oct 21 '15 at 05:51
  • 3
    Please consider making your question self contained. Once you pull the file from dropbox, this question may be rendered useless for future visitors. Distill the number of data points down to a minimum which still demonstrates the problem. – Roman Luštrik Oct 21 '15 at 06:00
  • After deleting duplicates, I tried "anyDuplicated(QuadratCoverBAT)" again. Now it tells me there are 459 duplicates. How do I get more duplicated when I delete data?! I looked at the data restructuring link provided. Their issue was the need to include two of their four fields in order to have unique values to insert in the matrix (no duplication). I only have three fields which become the row labels, column labels, and the data matrix. Thanks for your thoughts. – Danelle Oct 21 '15 at 06:04
  • So, in case anyone else is fighting with this later... The 459 is not the number of duplicates, but rather the row number of a duplicate. I got this working now. Thanks all for you patience and ideas! – Danelle Oct 21 '15 at 06:15

0 Answers0