I've been banging my head against the wall for awhile on this one and it seems so simple. I know I'm missing something key here.
Using Pig 0.12.1.2.1.2.0-402, Cassandra 2.0.9, I am trying to import a precision number (that needs to remain at the same precision) into Cassandra.
The data itself is exported from Oracle using Sqoop and the numbers look fine.
For example: The data in question is 38.62782. If I import using a pig double or float, precision is lost, which is not acceptable in this case. I've tried multiple combinations and pig's bigdecimal seems to be a perfect fit, but I cannot get it to work as I continually get the following:
ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR: java.math.BigDecimal cannot be cast to org.apache.pig.data.DataByteArray
So I don't understand what I need to do to make this work. I just want the 38.62782 from Oracle (and the Sqoop file) to appear as 38.62782 without making the Cassandra column a text field.
Sample pig:
DEFINE UnixToISO org.apache.pig.piggybank.evaluation.datetime.convert.UnixToISO();
DEFINE ISOToUnix org.apache.pig.piggybank.evaluation.datetime.convert.ISOToUnix();
DEFINE CustomToDate2Args com.mine.pig.udf.CustomToDate2Args();
DEFINE ToBoolean com.mine.pig.udf.ToBoolean();
DEFINE CustomCqlStorage com.mine.pig.CustomCqlStorage();
DEFINE s2d InvokeForDouble( 'java.lang.Double.parseDouble', 'String' );
oracle_load = LOAD '$input_file' USING PigStorage(' ') AS (
NAME:chararray,
MYDOUBLE:chararray,
MYFLOAT:float,
MYDECIMAL:bytearray,
MYTEXT:chararray);
oracle_data = FOREACH oracle_load generate
(NAME=='null'?null:NAME) as NAME,
MYDOUBLE,
MYFLOAT,
MYDECIMAL,
MYTEXT;
R = FOREACH oracle_data GENERATE TOTUPLE(TOTUPLE('name',NAME)), TOTUPLE(
s2d(MYDOUBLE),
MYFLOAT,
MYDECIMAL,
MYTEXT);
STORE R into 'cql://$cass_user:$cass_pass@$cass_keyspace/mydoubletest?output_query=update+$cass_keyspace.mydoubletest+set+mydouble+%3D+%3F,myfloat+%3D+%3F,mydecimal+%3D+%3F,mytext+%3D+%3F' USING CustomCqlStorage();
and the table definition I'm trying out just for reference:
CREATE TABLE mydoubletest (
name text,
mydecimal decimal,
mydouble double,
myfloat float,
mytext text,
PRIMARY KEY ((name))
) WITH
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
index_interval=128 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
default_time_to_live=0 AND
speculative_retry='99.0PERCENTILE' AND
memtable_flush_period_in_ms=0 AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};