Questions tagged [on-disk]
15 questions
41
votes
4 answers
Disk-backed STL container classes?
I enjoy developing algorithms using the STL, however, I have this recurring problem where my data sets are too large for the heap.
I have been searching for drop-in replacements for STL containers and algorithms which are disk-backed, i.e. the data…

oz10
- 153,307
- 27
- 93
- 128
11
votes
3 answers
What is the best approach when working with on-disk data structures
I would like to know how best to work with on-disk data structures given that the storage layout needs to exactly match the logical design. I find that structure alignment & packing do not really help much when you need to have a certain layout for…

DeLorean
- 307
- 1
- 11
8
votes
3 answers
B+Tree on-disk implementation in Java
Does anyone know where to find a B+Tree on-disk implementation? I went through google forward and backward and unfortunately I couldn't find anything sensible. Other threads have suggested to maybe take the tree from sqlite, sqljet or bdb but these…

mkn
- 12,024
- 17
- 49
- 62
6
votes
3 answers
Fast key-value disk storage for Python
I'm wondering if there is a fast on-disk key-value storage with Python bindings which supports millions of read/write calls to separate keys. My problem involves counting word co-occurrences in a very large corpora (Wikipedia), and continually…

Henrik Andersson
- 91
- 2
- 4
5
votes
2 answers
On-disk structure for storing a large set of 128-bit integers?
I have about 500 million 128-bit integers, adding about 100M per year. Nothing is ever deleted. The numbers come at a uniform distribution, scale-wise and time-wise.
Basically, all I need is an add operation that also returns whether the number…

itsadok
- 28,822
- 30
- 126
- 171
5
votes
1 answer
Small subset of huge matrix-like structure from disk transparently
A simplified version of the question
I have a huge matrix-like dataset, that we for now can pretend is actually an n-by-n matrix stored on-disk as n^2 IEEE-754 doubles (see details below the line on how this is a simplification - it probably…

gspr
- 11,144
- 3
- 41
- 74
2
votes
0 answers
How to increase ephemeral storage size in Kubernetes
What are good ways to provide big "scratch" storage to Kubernetes jobs/deployments that require a lot of disk?
I read about ephemeral volumes and it seems that the ideal thing for the case is to mount emptyDir volumes. My problem is that emptyDir…

Michele Piccolini
- 2,634
- 16
- 29
2
votes
4 answers
Scalable stl set like container for C++
I need to store large number of integers. There can be
duplicates in the input stream of integers, I just need
to store distinct amongst them.
I was using stl set initially but It went OutOfMem when
input number of integers went too high.
I am…

Pqr
- 121
- 1
- 4
2
votes
3 answers
Is there a Java equivalent of GetCompressedFileSize?
I am looking to get accurate (i.e. the real size on disk and not the normal size that includes all the 0's) measurements of sparse files in Java.
In C++ on Windows one would use GetCompressedFileSize. I have yet to come across how one would go about…

J C
- 73
- 10
1
vote
0 answers
Elegantly write the objects of B-tree onto the disk, maintaining the linked structure, in a simple programming language
I was going through the B-Tree topic in Introduction to Algorithms by Cormen et. al. And I was having a difficulty in implementing the disk-operations of the pseudocode in a real program. This might be the situation because few descriptions of the…

Abhishek Ghosh
- 597
- 7
- 18
0
votes
0 answers
on-disk B-tree: defer appending new pages to file
I am implementing an on-disk B-tree, and I have a question about creating new pages. According to the little information I found, when need to add new page I should append a new block to the B-tree file and then read it through buffer manager. I…

aetern
- 11
- 1
- 4
0
votes
4 answers
On Disk Substring index
I have a file (fasta file to be specific) that I would like to index, so that I can quickly locate any substring within the file and then find the location within the original fasta file.
This would be easy to do in many cases, using a Trie or…

emeryc
- 825
- 7
- 8
0
votes
1 answer
Use HashMap to store file positions and access these randomly using RandomAccessFile
Initial problem:
I have the following issue: I am joining 2 CSVs using Java. While I can "stream" one of the CSVs (read in, process, write out line-by-line), the smaller one resides in memory (a HashMap to be precise), as I need to look up the keys…

dotwin
- 1,302
- 2
- 11
- 31
0
votes
1 answer
Neo4j - On-disk Representation of Edges
I noticed a performance difference when querying via incoming and outgoing relationships for a given node. In this case, outgoing was much faster.
The input file that generates the graph is sorted by the start node for each edge.
Does the order of…

Jay
- 359
- 2
- 14
-1
votes
1 answer
iOS On-Disk Encryption. What if user disabled passcode after the file was encrypted?
I'm currently trying to understand how the iOS On-Disk Encryption works. I've read Protecting Data Using On-Disk Encryption in the iOS App Programming Guide. It says that user must have an active passcode lock set for the device.
But some things…

Alexander
- 959
- 5
- 11
- 29