Questions tagged [large-data]

Large data is data that is difficult to process and manage because its size is usually beyond the limits of the software being used to perform the analysis.

A large amount of data. Although there is no exact number that defines "large" (this is probably because "large" is different depending on the situation: in the web, 1MB or 2MB might be large. In an application that is meant to clone hard drives 5TB might be large), a specific number is unnecessary as this tag is meant for questions regarding problems caused by too much data, so it doesn't matter how much that is.

2088 questions

1178

votes

16 answers

"Large data" workflows using pandas

I have tried to puzzle out an answer to this question for many months while learning pandas. I use SAS for my day-to-day work and it is great for it's out-of-core support. However, SAS is horrible as a piece of software for numerous other…

asked Jan 10 '13 at 16:20

Zelazny7

39,946
18
70
84

117

votes

8 answers

What causes a Python segmentation fault?

I am implementing Kosaraju's Strong Connected Component(SCC) graph search algorithm in Python. The program runs great on small data set, but when I run it on a super-large graph (more than 800,000 nodes), it says "Segmentation Fault". What might be…

python segmentation-fault large-data

asked Apr 05 '12 at 20:28

xiaolong

3,396
4
31
46

104

votes

5 answers

Shared memory in multiprocessing

I have three large lists. First contains bitarrays (module bitarray 0.8.0) and the other two contain arrays of integers. l1=[bitarray 1, bitarray 2, ... ,bitarray n] l2=[array 1, array 2, ... , array n] l3=[array 1, array 2, ... , array n] These…

python multiprocessing shared-memory large-data

asked Jan 02 '13 at 15:28

FableBlaze

1,785
3
16
21

votes

4 answers

Append lines to a file

I'm new using R. I'm trying to add (append) new lines to a file with my existing data in R. The problem is that my data has about 30000 rows and 13000 cols. I already try to add a line with the writeLines function but the resulting file contains…

r file large-data

asked Oct 12 '11 at 14:22

Sergio Vela

votes

4 answers

Parallel.ForEach can cause a "Out Of Memory" exception if working with a enumerable with a large object

I am trying to migrate a database where images were stored in the database to a record in the database pointing at a file on the hard drive. I was trying to use Parallel.ForEach to speed up the process using this method to query out the…

c# out-of-memory task-parallel-library large-data

asked Aug 08 '11 at 02:21

Scott Chamberlain

124,994
33
282
431

votes

3 answers

Is there any JSON viewer to open large json files (windows)?

I have very large JSON file which is of several GB. I am looking for any efficient JSON viewer. In which we are also able to view JSON in tree format. I understand such huge file can't be loaded in one go. I wonder is there any software to view JSON…

json viewer json-view large-data

asked Nov 02 '15 at 07:03

Anwar Shaikh

1,591
3
22
43

votes

2 answers

Red Black Tree versus B Tree

I have a project in which I have to achieve fast search, insert and delete operations on data ranging from megabytes to terabytes. I had been studying data structures of late and analyzing them. Being specific, I want to introduce 3 cases and ask…

data-structures b-tree red-black-tree file-mapping large-data

asked Jun 19 '11 at 06:47

swanar

votes

8 answers

What is the difference between laravel cursor and laravel chunk method?

I would like to know what is the difference between laravel chunk and laravel cursor method. Which method is more suitable to use? What will be the use cases for both of them? I know that you should use cursor to save memory but how it actually…

php laravel large-data database-cursor

asked Aug 02 '17 at 15:12

Suraj

2,181
2
17
25

votes

3 answers

How to efficiently write large files to disk on background thread (Swift)

Update I have resolved and removed the distracting error. Please read the entire post and feel free to leave comments if any questions remain. Background I am attempting to write relatively large files (video) to disk on iOS using Swift 2.0, GCD,…

ios swift multithreading large-files large-data

asked Aug 12 '15 at 12:41

Tommie C.

12,895
5
82
100

votes

3 answers

Writing large Pandas Dataframes to CSV file in chunks

How do I write out a large data files to a CSV file in chunks? I have a set of large data files (1M rows x 20 cols). However, only 5 or so columns of the data files are of interest to me. I want to make things easier by making copies of these files…

python pandas dataframe export-to-csv large-data

asked Jul 22 '16 at 16:20

Korean_Of_the_Mountain

1,428
3
16
40

votes

2 answers

How to plot with a png as background?

I made a plot with a 3 million points and saved it as PNG. It took a few hours and I would like to avoid re-drawing all the points. How can I generate a new plot that has this PNG as a background?

r data-visualization large-data

asked Mar 11 '11 at 17:50

Aleksandr Levchuk

3,751
4
35
47

votes

5 answers

How to read only lines that fulfil a condition from a csv into R?

I am trying to read a large csv file into R. I only want to read and work with some of the rows that fulfil a particular condition (e.g. Variable2 >= 3). This is a much smaller dataset. I want to read these lines directly into a dataframe, rather…

r large-data read.csv

asked Apr 21 '14 at 11:38

Hernan

votes

2 answers

D3: How to show large dataset

I've a large dataset comprises 10^5 data points. And now I'm considering the following question related to large dataset: Is there any efficient way to visualize very large dataset? In my case I have a user set and each user has 10^3 items. There…

d3.js large-data

asked Aug 15 '13 at 01:35

SolessChong

3,370
8
40
67

votes

5 answers

Repeat NumPy array without replicating data?

I'd like to create a 1D NumPy array that would consist of 1000 back-to-back repetitions of another 1D array, without replicating the data 1000 times. Is it possible? If it helps, I intend to treat both arrays as immutable.

python numpy memory large-data

asked Apr 06 '11 at 09:20

NPE

486,780
108
951
1,012

votes

5 answers

Mean value and standard deviation of a very huge data set

I am wondering if there is an algorithm that calculates the mean value and standard deviation of an unbound data set. for example, I am monitoring an measurement value, say, electric current. I would like to have the mean value of all historical…

algorithm statistics mean numerics large-data

asked Apr 28 '12 at 15:51

Alfred Zhong

6,773
11
47
59

2 3

…

99 100 Next