1

I'm working on an R package for AT/Bluesky, and I'm trying out a sort of unconventional structure. Since the Bluesky methods are documented in very precise detail via the JSON lexicon defined here, I've copied these files into the /inst directory, so my package structure looks like this:

├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── R
│   ├── bluesky.R
│   ├── helpers.R
│   ├── lexicon.R
│   └── types.R
├── README.Rmd
├── README.md
├── blueRsky.Rproj
├── inst
│   └── lexicons
│       ├── app
│       │   └── bsky
│       │       ├── actor
│       │       │   ├── defs.json
│       │       │   ├── getProfile.json
│       │       │   ├── getProfiles.json
│       │       │   ├── getSuggestions.json
│       │       │   ├── profile.json
│       │       │   ├── searchActors.json
│       │       │   └── searchActorsTypeahead.json
│       │       ├── embed
│       │       │   ├── external.json
│       │       │   ├── images.json
│       │       │   ├── record.json
│       │       │   └── recordWithMedia.json
│       │       ├── feed
│       │       │   ├── defs.json
│       │       │   ├── getAuthorFeed.json
│       │       │   ├── getLikes.json
│       │       │   ├── getPostThread.json
│       │       │   ├── getPosts.json
│       │       │   ├── getRepostedBy.json
│       │       │   ├── getTimeline.json
│       │       │   ├── like.json
│       │       │   ├── post.json
│       │       │   └── repost.json
│       │       ├── graph
│       │       │   ├── follow.json
│       │       │   ├── getFollowers.json
│       │       │   ├── getFollows.json
│       │       │   ├── getMutes.json
│       │       │   ├── muteActor.json
│       │       │   └── unmuteActor.json
│       │       ├── notification
│       │       │   ├── getUnreadCount.json
│       │       │   ├── listNotifications.json
│       │       │   └── updateSeen.json
│       │       ├── richtext
│       │       │   └── facet.json
│       │       └── unspecced
│       │           └── getPopular.json
│       └── com
├── man
│   ├── bsky_get_profile.Rd
│   ├── bsky_get_session.Rd
│   ├── bsky_get_timeline.Rd
│   └── bsky_search_actors.Rd
└── tests
    ├── testthat
    │   └── test-lexicon.R
    └── testthat.R

I have functions that then look up the Lexicon schemas, for example

load_schema <- function(id) {
  loc <- c("lexicons", strsplit(id, "\\.")[[1]]) |>
    paste0(collapse = "/") |>
    paste0(".json")

  file_path <- system.file(loc, package = "blueRsky", mustWork = FALSE)
  cat("file_path:", file_path, "\n")
  if (!file.exists(file_path)) {
    stop(paste("Schema", id, "not found"))
  }

  jsonlite::read_json(system.file(loc, package = "blueRsky", mustWork = TRUE))
}

This is very nice, because it allows me to abstract a lot of boilerplate for constructing the requests.

This works great locally, I can install the package, and all my tests pass, but it fails R CMD CHECK. I believe the reason is that R CMD CHECK is removing some of the files from the lexicons directory. The output says:

── R CMD build ──────────────────────────────────────────────────────────────────
✔  checking for file ‘/Users/colinfraser/projects/blueRsky/DESCRIPTION’ ...
─  preparing ‘blueRsky’: (1.1s)
✔  checking DESCRIPTION meta-information ...
─  checking for LF line-endings in source and make files and shell scripts (346ms)
─  checking for empty or unneeded directories
   Removed empty directory ‘blueRsky/inst/lexicons/com’
─  building ‘blueRsky_0.0.0.9000.tar.gz’

Removed empty directory ‘blueRsky/inst/lexicons/com’.

That's not good, because in fact that directory is not empty, it just has nested folders in it. Then later on it fails because it can't load the package code.

Is there anything I can do to stop R CMD CHECK from deleting this file? Or could there be something else that's wrong here? I know that this is a bit of an unusual package structure, but it seems like it would work really well if I could get it to work.

crf
  • 1,810
  • 3
  • 15
  • 23
  • 3
    Just put a zero length file in there: `touch com/.do.not.remove` should do (if you have a `touch` binary on your OS -- else just write a few bytes to it.) As soon as the directory is non-empty, R will keep it for you, – Dirk Eddelbuettel Apr 29 '23 at 02:15
  • Thank you @DirkEddelbuettel! If you post this as an answer I'm happy to accept it as it correctly answers the question that I posed. However, I've realized that R CMD BUILD deleting the directory for me was not actually the root cause of my issue :( I have top-level calls in the package code that refer to code in /inst, which is apparently not available at build-time. I'm not sure how to fix this, I might have to completely rethink my package architecture here. – crf Apr 29 '23 at 18:59
  • 1
    As a total aside, if you happen to have a spare invite I'd most gladly take it :) Otherwise lingering on the waitlist. (And once you have JSON data look at [RcppSimdJson](https://cran.r-project.org/package=RcppSimdJson) which is faster than the other parsers. In case that matters to :) ) – Dirk Eddelbuettel Apr 29 '23 at 19:13
  • 1
    If they ever get around to giving me some invite codes you'll be at the top of my list! – crf Apr 29 '23 at 19:48

2 Answers2

1

R removed the directory inst/com/ because at the time of package building it was seen to be empty. R cannot know you plan to store files there later; it just sees an empty directory and aims to clean things up to make a neater package without any fluff.

The fix against this suggests itself: put a file into the directory, maybe inst/com/README.md, or, more minimally a zero-byte placeholder via

$ touch inst/com/.keep.this.directory

assuming your OS does have a touch binary.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
0

While a suitable answer has been provided in the comments by Dirk E., I would like to suggest that you not include all the data in your package. Rather, you could upload the data to a GitHub repository, and provide a function for downloading this at a later stage.

If the data is imperative for your package to function, consider checking for the data in your zzz.R, or on package load, and provide the user with instructions to download the data.

mhovd
  • 3,724
  • 2
  • 21
  • 47