Highest Voted 'data-partitioning' Questions

80

votes

14 answers

python equivalent of filter() getting two output lists (i.e. partition of a list)

Let's say I have a list, and a filtering function. Using something like >>> filter(lambda x: x > 10, [1,4,12,7,42]) [12, 42] I can get the elements matching the criterion. Is there a function I could use that would output two lists, one of elements…

python filter data-partitioning

asked Jan 02 '11 at 13:34

F'x

12,105
7
71
123

71

votes

3 answers

Difference between df.repartition and DataFrameWriter partitionBy?

What is the difference between DataFrame repartition() and DataFrameWriter partitionBy() methods? I hope both are used to "partition data based on dataframe column"? Or is there any difference?

apache-spark-sql data-partitioning

asked Nov 04 '16 at 06:10

Shankar

8,529
26
90
159

49

votes

11 answers

C# - elegant way of partitioning a list?

I'd like to partition a list into a list of lists, by specifying the number of elements in each partition. For instance, suppose I have the list {1, 2, ... 11}, and would like to partition it such that each set has 4 elements, with the last set…

c# list data-partitioning

asked Sep 08 '09 at 20:06

David Hodgson

10,104
17
56
77

35

votes

6 answers

What is the best way to divide a collection into 2 different collections?

I have a Set of numbers : Set mySet = [ 1,2,3,4,5,6,7,8,9] I want to divide it into 2 sets of odds and evens. My way was to use filter twice : Set set1 = mySet.stream().filter(y -> y % 2 ==…

java filter java-8 java-stream data-partitioning

asked Feb 06 '18 at 17:02

user1386966

3,302
13
43
72

21

votes

5 answers

Create grouping variable for consecutive sequences and split vector

I have a vector, such as c(1, 3, 4, 5, 9, 10, 17, 29, 30) and I would like to group together the 'neighboring' elements that form a regular, consecutive sequence, i.e. an increase by 1, in a ragged vector resulting in: L1: 1 L2: 3,4,5 L3: 9,10 L4:…

r vector sequence data-partitioning

asked Mar 07 '11 at 16:18

letsrock

211
2
3

21

votes

2 answers

Using jq how can I split a very large JSON file into multiple files, each a specific quantity of objects?

I have a large JSON file with I'm guessing 4 million objects. Each top level has a few levels nested inside. I want to split that into multiple files of 10000 top level objects each (retaining the structure inside each). jq should be able to do…

json jq data-partitioning

asked Apr 13 '18 at 02:52

Chaz

787
2
9
16

18

votes

7 answers

QuickSort and Hoare Partition

I have a hard time translating QuickSort with Hoare partitioning into C code, and can't find out why. The code I'm using is shown below: void QuickSort(int a[],int start,int end) { int q=HoarePartition(a,start,end); if (end<=start) return; …

c algorithm sorting quicksort data-partitioning

asked Aug 25 '11 at 22:51

Ofek Ron

8,354
13
55
103

17

votes

2 answers

Querying Windows Azure Table Storage with multiple query criteria

I'm trying to query a table in Windows Azure storage and was initially using the TableQuery.CombineFilters in the TableQuery().Where function as follows: TableQuery.CombineFilters( TableQuery.GenerateFilterCondition("PartitionKey",…

azure azure-table-storage data-partitioning

asked Jan 16 '14 at 16:42

Captain John

1,859
2
16
30

13

votes

5 answers

How to sort an integer array into negative, zero, positive part without changing relative position?

Give an O(n) algorithm which takes as input an array S, then divides S into three sets: negatives, zeros, and positives. Show how to implement this in place, that is, without allocating new memory. And you have to keep the number's relative…

arrays algorithm data-partitioning

asked Mar 18 '11 at 03:04

Gin

1,763
3
12
17

11

votes

1 answer

What is the difference between partitioning and bucketing in Spark?

I try to optimize a join query between two spark dataframes, let's call them df1, df2 (join on common column "SaleId"). df1 is very small (5M) so I broadcast it among the nodes of the spark cluster. df2 is very large (200M rows) so I tried to…

python apache-spark bucket data-partitioning

asked Jul 02 '19 at 17:28

nofar mishraki

526
1
4
15

11

votes

4 answers

How to write SQL query that selects distinct pair values for specific criteria?

I'm having trouble formulating a query for the following problem: For pair values that have a certain score, how do you group them in way that will only return distinct pair values with the best respective scores? For example, lets say I have a…

sql postgresql group-by data-partitioning

asked Nov 01 '16 at 17:17

Stephen Tableau

113
5

10

votes

5 answers

3D clustering Algorithm

Problem Statement: I have the following problem: There are more than a billion points in 3D space. The goal is to find the top N points which has largest number of neighbors within given distance R. Another condition is that the distance between any…

algorithm 3d cluster-analysis spatial data-partitioning

asked Aug 14 '10 at 05:30

Teng Lin

129
1
1
6

10

votes

2 answers

Hashing VS Indexing

Both hashing and indexing are use to partition data on some pre- defined formula. But I am unable to understand the key difference between the two. As in hashing we are dividing the data on the basis of some key value pair, similarly in Indexing…

hash indexing data-partitioning consistent-hashing

asked Dec 16 '13 at 21:26

coolDude

407
1
7
17

10

votes

2 answers

partitioning an float array into similar segments (clustering)

I have an array of floats like this: [1.91, 2.87, 3.61, 10.91, 11.91, 12.82, 100.73, 100.71, 101.89, 200] Now, I want to partition the array like this: [[1.91, 2.87, 3.61] , [10.91, 11.91, 12.82] , [100.73, 100.71, 101.89] , [200]] // [200] will…

java c++ algorithm cluster-analysis data-partitioning

asked Jul 05 '13 at 01:33

alessandro

1,681
10
33
54

9

votes

4 answers

python: Generating integer partitions

I need to generate all the partitions of a given integer. I found this algorithm by Jerome Kelleher for which it is stated to be the most efficient one: def accelAsc(n): a = [0 for i in range(n + 1)] k = 1 a[0] = 0 y = n - 1 …

python combinatorics performance data-partitioning

asked Apr 20 '12 at 10:08

etuardu

5,066
3
46
58

Questions tagged [data-partitioning]