Highest Voted 'scalding' Questions

98

votes

4 answers

Difference between reduce and foldLeft/fold in functional programming (particularly Scala and Scala APIs)?

Why do Scala and frameworks like Spark and Scalding have both reduce and foldLeft? So then what's the difference between reduce and fold?

asked Aug 06 '14 at 11:07

samthebest

30,803
25
102
142

42

votes

5 answers

Cascading examples failed to compile?

In shell I typed gradle cleanJar in the Impatient/part1 directory. The output is below. The error is "class file for org.apache.hadoop.mapred.JobConf not found". Why did it fail to compile? :clean UP-TO-DATE :compileJava Download…

java hadoop gradle cascading scalding

asked Sep 20 '12 at 10:53

Treper

3,539
2
26
48

10

votes

1 answer

uncompress and read gzip file in scala

In Scala, how does one uncompress the text contained in file.gz so that it can be processed? I would be happy with either having the contents of the file stored in a variable, or saving it as a local file so that it can be read in by the program…

scala gzip scalding

asked Jul 02 '13 at 22:00

EthanP

1,663
3
22
27

9

votes

3 answers

Unresolved dependency: com.hadoop.gplcompression#hadoop-lzo;0.4.16 when "sbt update" in scalding

After getting code from git using clone https://github.com/twitter/scalding.git and doing ./sbt update I get: :::::::::::::::::::::::::::::::::::::::::::::: [warn] :: UNRESOLVED DEPENDENCIES :: [warn] …

scala sbt scalding

asked Feb 14 '14 at 13:45

Anton Ashanin

1,817
5
30
43

7

votes

2 answers

Why does a for comprehension expand to a `withFilter`

I'm working on a DSL for relational (SQL-like) operators. I have a Rep[Table] type with an .apply: ((Symbol, ...)) => Obj method that returns an object Obj which defines .flatMap: T1 => T2 and .map: T1 => T3 functions. As the type Rep[Table] does…

scala for-comprehension scalding

asked May 22 '15 at 16:45

Pyetras

1,492
16
21

7

votes

2 answers

Can I output a collection instead of a tuple in Scalding map method?

If you want to create a pipe with more than 22 fields from a smaller one in Scalding you are limited by Scala tuples, which cannot have more than 22 items. Is there a way to use collections instead of tuples? I imagine something like in the…

scala scalding

asked Oct 25 '13 at 14:53

Calin-Andrei Burloiu

1,481
2
13
25

6

votes

3 answers

Write to multiple outputs by key Scalding Hadoop, one MapReduce Job

How can you write to multiple outputs dependent on the key using Scalding(/cascading) in a single Map Reduce Job. I could of course use .filter for all the possible keys, but that is a horrible hack, which will fire up many jobs.

scala hadoop mapreduce cascading scalding

asked Jun 02 '14 at 12:16

samthebest

30,803
25
102
142

6

votes

1 answer

scala filename too long

I'm using scala 2.10 and gradle 1.11 My problem is that the compiled jar drop an error when I try to running in the hadoop cluster. I want to run on hadoop because I using scalding. The exception is: Exception in thread "main"…

scala hadoop scalding

asked Apr 15 '14 at 20:56

user3537713

65
5

6

votes

1 answer

Scalding: How to retain the other field, after a groupBy('field){.size}?

So my input data has two fields/columns: id1 & id2, and my code is the following: TextLine(args("input")) .read .mapTo('line->('id1,'id2)) {line: String => val fields = line.split("\t") …

twitter cascading scalding

asked Jul 06 '13 at 22:02

jeremy.ting

155
1
1
7

5

votes

3 answers

Recommended way to access HBase using Scala

Now that SpyGlass is no longer being maintained, what is the recommended way to access HBase using Scala/Scalding? A similar question was asked in 2013, but most of the suggested links are either dead or to defunct projects. The only link that seems…

scala apache-spark hbase apache-flink scalding

asked May 18 '18 at 17:20

Ellen Spertus

6,576
9
50
101

5

votes

4 answers

(Scalding) groupBy foldLeft using the group by value in the fold

Have data like : pid recom-pid 1 1 1 2 1 3 2 1 2 2 2 4 2 5 Need to make it : pid, recommendations 1 2,3 2 1,4,5 Meaning ignore self from the 2nd column, and make the rest in to a comma separated string. Its tab separated…

scalding

asked Oct 04 '15 at 22:58

tgkprog

4,493
4
41
70

4

votes

0 answers

How can I sort elements of a TypedPipe in Scalding?

I have not been able to find a way to sort elements of a TypedPipe in Scalding (when not performing a group operation). Here are the relevant parts of my program (replacing irrelevant parts with ellipses): case class ReduceOutput(val slug :…

scalding

asked Aug 10 '18 at 00:27

Ellen Spertus

6,576
9
50
101

4

votes

2 answers

How to visualize steps of a scalding job

My scalding job is translated into 9 map reduce jobs (m/r jobs). It's not easy for me to understand which part of code each m/r job represents. Is there anything that could help me understand my job better? //this has been copy&pasted from our…

cascading scalding

asked Jun 06 '17 at 19:48

Oleksii

1,101
7
12

4

votes

0 answers

Scalding NPE only when assigning pipe to val

I'm new to Scala and Scalding, and in working on my first Job I'm encountering a NullPointerException when assigning a pipe to a val. The exact same job that just chains to a .write() without the intermediate variable completes as expected. What…

scala hadoop intellij-idea scalding

asked Jun 01 '17 at 20:25

jpk

281
2
11

4

votes

1 answer

how to perform an operation one time only at the end of a scalding job?

I read in scalding groupAll docs: /** * Group all tuples down to one reducer. * (due to cascading limitation). * This is probably only useful just before setting a tail such as Database * tail, so that only one reducer talks to…

scala hadoop cascading scalding

asked Mar 24 '15 at 10:02

Jas

14,493
27
97
148

Questions tagged [scalding]