I have been having some difficulty for some time producing a workflow with many inputs and a single output, such as is shown below. The code below works fine to some extent, however when there are too many input files the concatenate step invariably fails:
rule generate_text:
input:
"data/{name}.csv"
output:
"text_files/{name}.txt"
shell:
"somecommand {input} -o {output}"
rule concatenate_text :
input:
expand("text_files/{name}.txt", name=names)
output:
"summaries/summary.txt"
shell:
"cat {input} > {output}"
I have done some digging and found that this is attributable to a limitation on the number of characters that can be put in a single command. I am working with increasingly large numbers of inputs and therefore the above solution is not scalable.
Can anybody please propose any solutions to this issue? I haven't been able to find any online.
Ideally the solution wouldn't be one limited to just cat or other shell commands and could be employed within the structure of a rule in cases where --use-conda can be employed. My current fix involves using an onsuccess script as follows, but this doesn't allow use of --use-conda and rule specific conda environments.
One handy thing about the shell command is that you can feed it snakemake variables, but its not quite flexible enough for my purposes due to the aforementioned conda issue.
onsuccess:
shell("cat text_files/*.txt > summaries/summary.txt")