In discussions of how to organize workflows and projects in R it is often recommended that a package be written to document & share work. I was wondering: is there any precedence for using an R package to publish & make publicly available data (as well as associated materials such as metadata, custom data processing tools, etc), either via CRAN or some other outlet? I work with data that requires multiple stages of cleaning, eg basic removal of typos, rudimentary record matching & custom imputation of missing data, followed by various forms of reshaping & aggregation for specific analyses. An R package seems like a useful way to document and present the data & methods used to produce. The main downside is the investment in time. The upsides appear numerous: high standards of documentation for future students in our lab, my future self, and other potential users, full reproducibility, and a platform for updating the data as more is collected.
Some context: Publishing data as flat files + meta data is increasingly common in my field via online appendices hosted by journals; a third-party website is also popular. Reproduction of figures & analyses is usually possible, but data are sometimes highly "massaged" & steps in processing cannot always be reproduced, sometimes limiting the ability to carry out alternative analyses. My adviser & I would like to publish data from the 1st 10 years of a 15 year longitudinal study. I already need to clean up my data processing scripts for passing on to future students/co-authors, which alone might make a package for in-house use useful.