Import decimal to cassandra using pig

Question

I've been banging my head against the wall for awhile on this one and it seems so simple. I know I'm missing something key here.

Using Pig 0.12.1.2.1.2.0-402, Cassandra 2.0.9, I am trying to import a precision number (that needs to remain at the same precision) into Cassandra.

The data itself is exported from Oracle using Sqoop and the numbers look fine.

For example: The data in question is 38.62782. If I import using a pig double or float, precision is lost, which is not acceptable in this case. I've tried multiple combinations and pig's bigdecimal seems to be a perfect fit, but I cannot get it to work as I continually get the following:

ERROR org.apache.pig.tools.pigstats.SimplePigStats  - ERROR: java.math.BigDecimal cannot be cast to org.apache.pig.data.DataByteArray

So I don't understand what I need to do to make this work. I just want the 38.62782 from Oracle (and the Sqoop file) to appear as 38.62782 without making the Cassandra column a text field.

Sample pig:

DEFINE UnixToISO org.apache.pig.piggybank.evaluation.datetime.convert.UnixToISO();
DEFINE ISOToUnix org.apache.pig.piggybank.evaluation.datetime.convert.ISOToUnix();
DEFINE CustomToDate2Args com.mine.pig.udf.CustomToDate2Args();
DEFINE ToBoolean com.mine.pig.udf.ToBoolean();
DEFINE CustomCqlStorage com.mine.pig.CustomCqlStorage();
DEFINE s2d InvokeForDouble( 'java.lang.Double.parseDouble', 'String' );

oracle_load = LOAD '$input_file' USING PigStorage('     ') AS (
  NAME:chararray,
  MYDOUBLE:chararray,
  MYFLOAT:float,
  MYDECIMAL:bytearray,
  MYTEXT:chararray);

oracle_data = FOREACH oracle_load generate
  (NAME=='null'?null:NAME) as NAME,
  MYDOUBLE,
  MYFLOAT,
  MYDECIMAL,
  MYTEXT;


R = FOREACH oracle_data GENERATE TOTUPLE(TOTUPLE('name',NAME)), TOTUPLE(
  s2d(MYDOUBLE),
  MYFLOAT,
  MYDECIMAL,
  MYTEXT);


STORE R into 'cql://$cass_user:$cass_pass@$cass_keyspace/mydoubletest?output_query=update+$cass_keyspace.mydoubletest+set+mydouble+%3D+%3F,myfloat+%3D+%3F,mydecimal+%3D+%3F,mytext+%3D+%3F' USING CustomCqlStorage();

and the table definition I'm trying out just for reference:

CREATE TABLE mydoubletest (
  name text,
  mydecimal decimal,
  mydouble double,
  myfloat float,
  mytext text,
  PRIMARY KEY ((name))
) WITH
  bloom_filter_fp_chance=0.010000 AND
  caching='KEYS_ONLY' AND
  dclocal_read_repair_chance=0.000000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.100000 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

Here's a doc on Cassandra CQL data types, and what they map to. See if you can find a match in here: http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/cql_data_types_c.html — Aaron, Feb 26 '15 at 22:55
Yes, it says that decimal in Cassandra maps to java.math.BigDecimal, which makes it all the more frustrating that I can't just use "MYDECIMAL:bigdecimal" above (when I do that I get the exception noted). There's also no InvokeToBigDecimal option, so I'm not sure if I need to do something with UDF, but it seems like such a circuitous way to do something so seemingly simple. — user4611739, Feb 26 '15 at 22:58

score 0 · Accepted Answer · edited May 23 '17 at 12:12

It turns out my problem was very different than the answer I was looking for.

First off, creating a UDF did work for this. Two things though, is that we were using a Custom Cql Storage routine which did not take BigDecimal into account. Second, after further exploration because of a separate reason it turns out that the issue was with cqlsh. The fields were not actually losing precision.

However, the answer to my overall problem was exactly what was listed here: Astyanax Cassandra Double type precision

Import decimal to cassandra using pig

1 Answers1