Questions tagged [azure-hdinsight]

Questions about Azure HDInsight, is a managed Apache Hadoop service that lets you run Apache Spark, Apache Hive, Apache Kafka, Apache HBase, and more in the Microsoft Azure cloud.

Azure-HDInsight is a managed Apache Hadoop service that lets you run Apache Spark, Apache Hive, Apache Kafka, Apache HBase, and more in the cloud.

934 questions
88
votes
6 answers

Differences between Azure Block Blob and Page Blob?

As I recently started mingling around with Windows Azure, I've came up to a situation where, which one to go for between the Block Blob & Page Blob. I'm currently in progress of uploading some text, csv or dat files to a blob storage and then do a…
Kulasangar
  • 9,046
  • 5
  • 51
  • 82
19
votes
2 answers

What does %{ $_.Key1 } mean?

While programming for HDInsight I came across lines like $storageAccountKey = Get-AzureRmStorageAccountKey -ResourceGroupName $resourceGroupName -Name $storageAccountName | %{ $_.Key1 } I understand $_ refers to the result of the…
Frank im Wald
  • 896
  • 1
  • 11
  • 28
14
votes
5 answers

Azure Data lake VS Azure HDInsight

I was going through the Microsoft documents: https://learn.microsoft.com/en-us/azure/data-lake-store/data-lake-store-overview I'm new to Azure Data lake and HDInsight. There is a statement in the URL which tells that "Azure Data Lake Store can be…
AskMe
  • 2,495
  • 8
  • 49
  • 102
14
votes
1 answer

Create hive external table from partitioned parquet files in Azure HDInsights

I have data saved as parquet files in Azure blob storage. Data is partitioned by year, month, day and hour like: cont/data/year=2017/month=02/day=01/ I want to create external table in Hive using following create statement, which I wrote using this…
chhantyal
  • 11,874
  • 7
  • 51
  • 77
13
votes
3 answers

Spark SQL: How to consume json data from a REST service as DataFrame

I need to read some JSON data from a web service thats providing REST interfaces to query the data from my SPARK SQL code for analysis. I am able to read a JSON stored in the blob store and use it. I was wondering what is the best way to read the…
Kiran
  • 2,997
  • 6
  • 31
  • 62
11
votes
3 answers

How to efficiently store and query a billion rows of sensor data

Situation: I've started a new job and been assigned the task of figuring out what to do with their sensor data table. It has 1.3 billion rows of sensor data. The data is pretty simple: basically just a sensor ID, a date and the sensor value at that…
11
votes
4 answers

AzureException: Unable to access container using anonymous credentials, and no credentials found for them in the configuration

I am trying to use Hadoop of Azure HDInsight. I am logging into the cluster by ssh and running the following hadoop jar jar_name class_name wasb://container@storagename.core.windows.net/inputdir…
Raghava
  • 947
  • 4
  • 15
  • 29
11
votes
3 answers

Create external table with select from other table

I am using HDInsight and need to delete my clusters when I am finished running queries. However, I need the data I gather to survive for another day. I am working on queries that would create calculated columns from table1 and insert them into…
Roger
  • 2,063
  • 4
  • 32
  • 65
10
votes
2 answers

ConcurrentModificationException when using Spark collectionAccumulator

I'm trying to run a Spark-based application on an Azure HDInsight on-demand cluster, and am seeing lots of SparkExceptions (caused by ConcurrentModificationExceptions) being logged. The application runs without these errors when I start a local…
codebox
  • 19,927
  • 9
  • 63
  • 81
9
votes
2 answers

Spark - how to get filename with parent folder from dataframe column

I am using pyspark as code language. I added column to get filename with path. from pyspark.sql.functions import input_file_name data = data.withColumn("sourcefile",input_file_name()) I want to retrieve only filename with it's parent folder from…
Hemant Chandurkar
  • 363
  • 1
  • 3
  • 14
9
votes
2 answers

How to load CSVs with timestamps in custom format?

I have a timestamp field in a csv file that I load to a dataframe using spark csv library. The same piece of code works on my local machine with Spark 2.0 version but throws an error on Azure Hortonworks HDP 3.5 and 3.6. I have checked and Azure…
9
votes
2 answers

spark-shell error : No FileSystem for scheme: wasb

We have HDInsight cluster in Azure running, but it doesn't allow to spin up edge/gateway node at the time of cluster creation. So I was creating this edge/gateway node by installing echo 'deb…
roy
  • 6,344
  • 24
  • 92
  • 174
9
votes
1 answer

Is there a Spark SQL jdbc driver?

I'm looking for a client jdbc driver that supports Spark SQL. I have been using Jupyter so far to run SQL statements on Spark (running on HDInsight) and I'd like to be able to connect using JDBC so I can use third-party SQL clients (e.g. SQuirreL,…
aaronsteers
  • 2,277
  • 2
  • 21
  • 38
9
votes
2 answers

In Hive, how can I add a column only if that column does not exist?

I would like to add a new column to a table, but only if that column does not already exist. This works if the column does not exist: ALTER TABLE MyTable ADD COLUMNS (mycolumn string); But when I execute it a second time, I get an error. Column…
MattD
  • 1,324
  • 4
  • 14
  • 28
7
votes
2 answers

Azure Storm vs Azure Stream Analytics

Looking to do real time metric calculations on event streams, what is a good choice in Azure? Stream Analytics or Storm? I am comfortable with either SQL or Java, so wondering what are the other differences.
1
2 3
62 63