5

I have a GitHub repo of a pipeline that requires very large files as input (basic test datasets would be around 1-2 Gb).

I thought about circunventing this by doing CICD locally, but this will not allow the CICD to run if other people want to contribute to the repo right?

Is there any workflow that allows for complex CICD with large datasets, while also enabling pull requests CICD?

  • 1
    Is this a public repo? Can you store the large files in a cloud bucket or otherwise on a server somewhere? – bk2204 Mar 04 '21 at 22:56
  • It is a public repo. And I can host the files with OneDrive, because of my account from university (I'm a PhD student). But can I download files from OneDrive as big as a couple of Gbs, during GitHub actions? I thought there would be some hard limit on that... even though the repos themselves can go up to 5Gb – João Sequeira Mar 05 '21 at 09:46
  • Can you use a subset of the data? e.g. if these are human genome FASTQs do your testing just on chr22 or some other subset? Remember that the CI/CD test machines are going to have to download the data, even if it's on Github, so it's little to no more work to download it from some other publically accessible site. – Tom Morris Mar 06 '21 at 17:33
  • Yes, I'll try using smaller files. Also it seems GitHub actions actually handle large files okay! – João Sequeira Mar 09 '21 at 14:10

0 Answers0