Questions tagged [on-disk]

15 questions
41
votes
4 answers

Disk-backed STL container classes?

I enjoy developing algorithms using the STL, however, I have this recurring problem where my data sets are too large for the heap. I have been searching for drop-in replacements for STL containers and algorithms which are disk-backed, i.e. the data…
oz10
  • 153,307
  • 27
  • 93
  • 128
11
votes
3 answers

What is the best approach when working with on-disk data structures

I would like to know how best to work with on-disk data structures given that the storage layout needs to exactly match the logical design. I find that structure alignment & packing do not really help much when you need to have a certain layout for…
DeLorean
  • 307
  • 1
  • 11
8
votes
3 answers

B+Tree on-disk implementation in Java

Does anyone know where to find a B+Tree on-disk implementation? I went through google forward and backward and unfortunately I couldn't find anything sensible. Other threads have suggested to maybe take the tree from sqlite, sqljet or bdb but these…
mkn
  • 12,024
  • 17
  • 49
  • 62
6
votes
3 answers

Fast key-value disk storage for Python

I'm wondering if there is a fast on-disk key-value storage with Python bindings which supports millions of read/write calls to separate keys. My problem involves counting word co-occurrences in a very large corpora (Wikipedia), and continually…
5
votes
2 answers

On-disk structure for storing a large set of 128-bit integers?

I have about 500 million 128-bit integers, adding about 100M per year. Nothing is ever deleted. The numbers come at a uniform distribution, scale-wise and time-wise. Basically, all I need is an add operation that also returns whether the number…
itsadok
  • 28,822
  • 30
  • 126
  • 171
5
votes
1 answer

Small subset of huge matrix-like structure from disk transparently

A simplified version of the question I have a huge matrix-like dataset, that we for now can pretend is actually an n-by-n matrix stored on-disk as n^2 IEEE-754 doubles (see details below the line on how this is a simplification - it probably…
gspr
  • 11,144
  • 3
  • 41
  • 74
2
votes
0 answers

How to increase ephemeral storage size in Kubernetes

What are good ways to provide big "scratch" storage to Kubernetes jobs/deployments that require a lot of disk? I read about ephemeral volumes and it seems that the ideal thing for the case is to mount emptyDir volumes. My problem is that emptyDir…
Michele Piccolini
  • 2,634
  • 16
  • 29
2
votes
4 answers

Scalable stl set like container for C++

I need to store large number of integers. There can be duplicates in the input stream of integers, I just need to store distinct amongst them. I was using stl set initially but It went OutOfMem when input number of integers went too high. I am…
Pqr
  • 121
  • 1
  • 4
2
votes
3 answers

Is there a Java equivalent of GetCompressedFileSize?

I am looking to get accurate (i.e. the real size on disk and not the normal size that includes all the 0's) measurements of sparse files in Java. In C++ on Windows one would use GetCompressedFileSize. I have yet to come across how one would go about…
J C
  • 73
  • 10
1
vote
0 answers

Elegantly write the objects of B-tree onto the disk, maintaining the linked structure, in a simple programming language

I was going through the B-Tree topic in Introduction to Algorithms by Cormen et. al. And I was having a difficulty in implementing the disk-operations of the pseudocode in a real program. This might be the situation because few descriptions of the…
Abhishek Ghosh
  • 597
  • 7
  • 18
0
votes
0 answers

on-disk B-tree: defer appending new pages to file

I am implementing an on-disk B-tree, and I have a question about creating new pages. According to the little information I found, when need to add new page I should append a new block to the B-tree file and then read it through buffer manager. I…
aetern
  • 11
  • 1
  • 4
0
votes
4 answers

On Disk Substring index

I have a file (fasta file to be specific) that I would like to index, so that I can quickly locate any substring within the file and then find the location within the original fasta file. This would be easy to do in many cases, using a Trie or…
emeryc
  • 825
  • 7
  • 8
0
votes
1 answer

Use HashMap to store file positions and access these randomly using RandomAccessFile

Initial problem: I have the following issue: I am joining 2 CSVs using Java. While I can "stream" one of the CSVs (read in, process, write out line-by-line), the smaller one resides in memory (a HashMap to be precise), as I need to look up the keys…
dotwin
  • 1,302
  • 2
  • 11
  • 31
0
votes
1 answer

Neo4j - On-disk Representation of Edges

I noticed a performance difference when querying via incoming and outgoing relationships for a given node. In this case, outgoing was much faster. The input file that generates the graph is sorted by the start node for each edge. Does the order of…
Jay
  • 359
  • 2
  • 14
-1
votes
1 answer

iOS On-Disk Encryption. What if user disabled passcode after the file was encrypted?

I'm currently trying to understand how the iOS On-Disk Encryption works. I've read Protecting Data Using On-Disk Encryption in the iOS App Programming Guide. It says that user must have an active passcode lock set for the device. But some things…
Alexander
  • 959
  • 5
  • 11
  • 29