In shell I typed gradle cleanJar in the Impatient/part1 directory. The output is below. The error is "class file for org.apache.hadoop.mapred.JobConf not found". Why did it fail to compile?
:clean UP-TO-DATE
:compileJava
Download…
In Scala, how does one uncompress the text contained in file.gz so that it can be processed? I would be happy with either having the contents of the file stored in a variable, or saving it as a local file so that it can be read in by the program…
After getting code from git using clone https://github.com/twitter/scalding.git and doing ./sbt update I get:
::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] …
I'm working on a DSL for relational (SQL-like) operators. I have a Rep[Table] type with an .apply: ((Symbol, ...)) => Obj method that returns an object Obj which defines .flatMap: T1 => T2 and .map: T1 => T3 functions. As the type Rep[Table] does…
If you want to create a pipe with more than 22 fields from a smaller one in Scalding you are limited by Scala tuples, which cannot have more than 22 items.
Is there a way to use collections instead of tuples? I imagine something like in the…
How can you write to multiple outputs dependent on the key using Scalding(/cascading) in a single Map Reduce Job. I could of course use .filter for all the possible keys, but that is a horrible hack, which will fire up many jobs.
I'm using scala 2.10 and gradle 1.11
My problem is that the compiled jar drop an error when I try to running in the hadoop cluster.
I want to run on hadoop because I using scalding.
The exception is:
Exception in thread "main"…
So my input data has two fields/columns: id1 & id2, and my code is the following:
TextLine(args("input"))
.read
.mapTo('line->('id1,'id2)) {line: String =>
val fields = line.split("\t")
…
Now that SpyGlass is no longer being maintained, what is the recommended way to access HBase using Scala/Scalding? A similar question was asked in 2013, but most of the suggested links are either dead or to defunct projects. The only link that seems…
Have data like :
pid recom-pid
1 1
1 2
1 3
2 1
2 2
2 4
2 5
Need to make it :
pid, recommendations
1 2,3
2 1,4,5
Meaning ignore self from the 2nd column, and make the rest in to a comma separated string. Its tab separated…
I have not been able to find a way to sort elements of a TypedPipe in Scalding (when not performing a group operation). Here are the relevant parts of my program (replacing irrelevant parts with ellipses):
case class ReduceOutput(val slug :…
My scalding job is translated into 9 map reduce jobs (m/r jobs). It's not easy for me to understand which part of code each m/r job represents. Is there anything that could help me understand my job better?
//this has been copy&pasted from our…
I'm new to Scala and Scalding, and in working on my first Job I'm encountering a NullPointerException when assigning a pipe to a val. The exact same job that just chains to a .write() without the intermediate variable completes as expected.
What…
I read in scalding groupAll docs:
/**
* Group all tuples down to one reducer.
* (due to cascading limitation).
* This is probably only useful just before setting a tail such as Database
* tail, so that only one reducer talks to…