Questions tagged [qubole]

Qubole Data Service (QDS) is cloud Big Data service running on an elastic Hadoop-based cluster

Source Creators of Facebook’s Big Data infrastructure and Apache Hive have leveraged their experience to deliver Qubole Data Service (QDS) – a cloud Big Data service offering the same advanced capabilities used by Big Data savvy organizations.

Minimize operational interaction and provide your data analysts with an easy to use graphical interface, built-in connectors, and seamless, elastic cloud infrastructure.

Your Hadoop cluster is ready within minutes post signup, letting you focus on building sophisticated data pipelines, running queries, scheduling jobs and monetizing your big data.

An auto-scaling cluster, improved I/O optimization, faster queries and support for hybrid pricing - realize cost savings of as much as 50%-60% in total, while accomplishing tasks faster.

87 questions

votes

1 answer

Stratified Sampling in Hive

The following returns a 10% sample of the A and X columns stratified by the values of X. select A, X from( select A, count(*) over (partition by X) as cnt, rank() over (partition by X order by rand()) as rnk from my_table)…

sql hive qubole

asked Aug 12 '14 at 21:50

Amelio Vazquez-Reina

91,494
132
359
564

votes

1 answer

How to kill hadoop job gracefully/intercept `hadoop job -kill`

My Java application runs on mapper and creates child processes using Qubole API. Application stores child qubole queryIDs. I need to intercept kill signal and shutdown child processes before exit. hadoop job -kill jobId and yarn application -kill…

java hadoop mapreduce qubole

asked May 30 '17 at 19:16

leftjoin

36,950
8
57
116

votes

1 answer

Divide Spark DataFrame data into separate files

I have the following DataFrame input from a s3 file and need to transform the data into the following desired output. I am using Spark version 1.5.1 with Scala, but could change to Spark with Python. Any suggestions are welcome. DataFrame…

scala apache-spark dataframe amazon-s3 qubole

asked Nov 11 '16 at 18:18

satoukum

1,188
1
21
31

votes

0 answers

Fetch all Column Statistics using Single Query Hive

I understand that all the column statistics can be computed for a Hive table using the command- ANALYZE TABLE Table1 COMPUTE STATISTICS; Then Specific column level stats can be fetched through the command - DESCRIBE FORMATTED…

hive bigdata qubole hive-query

asked Jul 10 '18 at 11:00

Abhi Nandan

votes

1 answer

Insert into ElasticSearch using Hive/Qubole

I am trying to insert data into elastic search from a hive table. CREATE EXTERNAL TABLE IF NOT EXISTS es_temp_table ( dt STRING, text STRING ) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' …

elasticsearch hive qubole

asked Feb 18 '15 at 18:34

stogers

votes

1 answer

How do you write a presto query to split a string into its own column

Trying to splint a string into multiple columns in qubole using presto query. {"field0":[{"startdate":"2022-07-13","lastnightdate":"2022-07-16","adultguests":5,"childguests":0,"pets":null}]} Would like startdate,lastnightdate,adultguests,childguests…

sql presto qubole

asked Jul 12 '22 at 14:37

Abe

votes

1 answer

need regexp_extract help, beginner

I have string column "49b8b35e-b62c-4a42-9d73-192d131d127a,03c8a7e0-5153-11ec-873a-0242ac11000a,eec8aee4-0500-4940-b319-15924cc2d248" this string column has 3 values separate by ",". (value1,value2,value3). there is no guarantees that vaule2 and…

sql regex hive hiveql qubole

asked Dec 13 '21 at 15:35

ajk

votes

1 answer

Data comparisons in Qubole

I am very new to Qubole.We recently migrated Oracle ebiz data to Saleforce.We have both Ebiz and Salesforce data in the Qubole Data Lake.There are some discrepancies between Ebiz and Salesforce.What is the technology I can use on Qubole to find…

qubole

asked Dec 06 '21 at 23:40

user2280352

votes

1 answer

Pyspark Logging: Printing information at the wrong log level

Thanks for your time! I'd like to create and print legible summaries of my (hefty) data to my output when debugging my code, but stop creating and printing those summaries once finished to speed things up. I was advised to use logging, which I…

apache-spark logging pyspark qubole

asked May 13 '20 at 19:07

Amit

votes

1 answer

How to create external tables from parquet files in s3 using hive 1.2?

I have created an external table in Qubole(Hive) which reads parquet(compressed: snappy) files from s3, but on performing a SELECT * table_name I am getting null values for all columns except the partitioned column. I tried using different…

hadoop hive hiveql qubole

asked May 15 '19 at 20:21

S.Mehra

votes

1 answer

Debug failed shuffles in hadoop map reduces

I am seeing as the size of the input file increase failed shuffles increases and job complete time increases non linearly. eg. 75GB took 1h 86GB took 5h I also see average shuffle time increase 10 fold eg. 75GB 4min 85GB 41min Can someone point me…

hadoop mapreduce qubole

asked Sep 21 '18 at 18:03

Jal

2,174
1
18
37

votes

2 answers

Fixing java.lang.NoSuchMethodError: com.amazonaws.util.StringUtils.trim

Consider the following error: 2018-07-12 22:46:36,087 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: com.amazonaws.util.StringUtils.trim(Ljava/lang/String;)Ljava/lang/String; at…

java mapreduce aws-java-sdk qubole

asked Jul 12 '18 at 23:04

Jal

2,174
1
18
37

votes

1 answer

java.io.FileNotFound exception while writing to apache spark in qubole

I have a code in apache spark 1.6.3 running on qubole which writes data to multiple tables(parquet format) on s3. At the time of writing to tables I keep getting java.io.FileNotFound exception. I am even setting:…

apache-spark amazon-s3 eventual-consistency qubole

asked Nov 23 '17 at 04:45

Raghwendra Singh

votes

0 answers

Kafka Connect Hive Integration issue

I am using kafka connect for hive integration to create hive tables along with partitions on s3. After starting connect distributed process and making a post call to listen to a topic, as soon as there is some data in the topic, I can see in the…

apache-kafka apache-kafka-connect confluent-platform qubole

asked Jul 16 '17 at 19:36

Ashish

votes

1 answer

Median value from table with number:count format

Given a table +------------+-----------+ | Number | Count | +------------+-----------+ | 0 | 7 | +------------+-----------+ | 1 | 1 | +------------+-----------+ | 2 | 3 …

mysql sql hive qubole

asked Oct 06 '15 at 06:27

Lenix

2 3 4 5 6 Next