How expensive does an operation need to be for it to be worth writing to a file?

Question

I am writing some scripts in Python for fun. The particular project I am working on involves factoring large numbers. Even after some optimization based on this algorithm, this will always become a heavy-duty operation for large enough inputs.

In order to be able to work with the factorizations manually later and avoid re-computing, I intend to write them to files in a systematic way. Originally my intention was to avoid re-calculating factorizations, but a collaborator pointed out to me that file I/O is also computationally expensive! My question is this: how expensive does an operation (such as decomposition of a large number into prime factors) need to be in order for it to be worthwhile opening and reading the output from a file rather than re-computing on the fly?

More generally, what factors affect the threshold of when output databases will be cost-efficient? For context, since my algorithm systematically tests divisibility by 2 and 3 and then numbers of the form 6n plus or minus 1 starting from n = 1, the time taken to factorize is currently at O(n**0.5), and is maximally inefficient at prime inputs.

Seems simple enough: When the time it takes to do the I/O is longer than what it would take to recompute the values. — martineau, Sep 12 '20 at 16:22
Why don't you just test it for yourself and see what happens? The exact answer is always going to system-dependent, so it doesn't seem worth spending any time theoriziing about it. — ekhumoro, Sep 12 '20 at 16:28
@Carcigenicate I could set it up with a *lot* of variables and write to output at the end of the process, but once the question had occurred to me I got curious. — Morgan Rogers, Sep 12 '20 at 16:30
@ekhumoro any pointers on how to run such a comparison test? And I still would like to know if there are any generally known factors that could affect the outcome. — Morgan Rogers, Sep 12 '20 at 16:30
I'll note that `functools.lru_cache` is an existing implementation of function memoization, if writing to file doesn't end up being feasible. — Carcigenicate, Sep 12 '20 at 16:35
@MorganRogers Put some pre-computed results in a file or db (or whatever) and time how long it takes to read them vs. how long it takes to calculate them from scratch. — ekhumoro, Sep 12 '20 at 16:43
Modern processors can do a **lot** of calculations in the time it takes to do a single disk read, especially if you need to open the file first. For really large numbers, how often will you be required to factor the same number twice? There's nothing wrong with benchmarking to find out for yourself, but my intuition tells me it won't be worth it. — Mark Ransom, Sep 12 '20 at 16:54

How expensive does an operation need to be for it to be worth writing to a file?

0 Answers0