IBM InfoSphere BigInsights brings the power of Hadoop to the enterprise. Apache Hadoop is the open source software framework, used to reliably manage large volumes of structured and unstructured data.
Questions tagged [biginsights]
103 questions
25
votes
4 answers
How to get hadoop put to create directories if they don't exist
I have been using Cloudera's hadoop (0.20.2).
With this version, if I put a file into the file system, but the directory structure did not exist, it automatically created the parent directories:
So for example, if I had no directories in hdfs and…

owly
- 251
- 1
- 3
- 4
6
votes
2 answers
How to write data in the dataframe into single .parquet file(both data & metadata in single file) in HDFS?
How to write data in the dataframe into single .parquet file(both data & metadata in single file) in HDFS?
df.show() --> 2 rows
+------+--------------+----------------+
|…

Shiva Ram
- 61
- 1
- 4
4
votes
2 answers
What is meaning of "Hadoop distribution"
I am new to hadoop. I recently read about basics of Apache Hadoop, Pig, Hive, HBase.
Then I came across term "Hadoop distribution" and examples were Cloudera, MAPR, HortonWorks.
So what is relation of Apache Hadoop (& its echo-system ) with "Hadoop…

Kaushik Lele
- 6,439
- 13
- 50
- 76
4
votes
1 answer
IBM BigInsights (IBM Hadoop) vs IBM Watson
What is the difference between IBM Watson and IBM Inforsphere BigInsights (IBM Hadoop)/Streams? What Watson brings to the table that BigInsights wouldn't?

Amir HZ
- 43
- 4
3
votes
2 answers
PYSPARK_PYTHON works with --deploy-mode client but not --deploy-mode cluster
I'm trying to run a python script using a custom python and deploy --deploy-mode cluster on an Enterprise 4.2 cluster.
[biadmin@bi4c-xxxxx-mastermanager ~]$ hive
hive> CREATE TABLE pokes (foo INT, bar STRING);
OK
Time taken: 2.147 seconds
hive>…

Chris Snow
- 23,813
- 35
- 144
- 309
3
votes
1 answer
Installation BigInsights 4.2
I would like to ask you about instalation BigInsights 4.2 on centos 7. As far I know, now the instalation is only avaiable via kitematic or dockerhub, but kitematic is only avaiable for widnows or mac. If i want to install via dockerhub I have to…

whizzkid
- 33
- 3
3
votes
0 answers
where does ${spark.yarn.app.container.log.dir} resolve to on BigInsights on cloud?
I'm trying to configure spark streaming logging. The spark docs state to set the following property:
log4j.appender.file_appender=${spark.yarn.app.container.log.dir}/spark.log
Where does spark.yarn.app.container.log.dir point to on a BigInsights…

Chris Snow
- 23,813
- 35
- 144
- 309
3
votes
1 answer
Error in installing H2O ai R package in BigInsights cluster in Bluemix
I have a 5 node BigInsights hadoop cluster in Bluemix. I am getting error, when I am trying to install H2O ai R in BigInsights cluster.
install.packages("h2o", type="source",…

Pari Margu
- 209
- 3
- 10
3
votes
2 answers
Hadoop Cannot set Reducers > 1
I am using Hadoop for a university assignment and I have the code working however im running into a small issue.
I am trying to set the number of reducers to 19 ( which is 0.95 * capacity as the docs suggest). However when I view my job in the task…

Nick
- 900
- 1
- 10
- 19
2
votes
0 answers
Spark Streaming not working on IBM BigInsights
I was testing a script that extracted tweets in real time using Spark Streaming. These tweets are supposed to be loaded into the IBM BigInsights hdfs environment. The script is written in python and I used yarn for cluster management.
It runs fine…

pratikbhd
- 21
- 3
2
votes
1 answer
java.lang.ClassNotFoundException: Failed to find data source: com.cloudant.spark. in IBM BigInsights cluster
I have created an IBM BigInsights service instance with hadoop cluster of 5 nodes (including Apache Spark). I trying to use SparkR to connect a Cloudant Database, get some data, and do some processing.
I have launched a SparkR shell(terminal) and…

Pari Margu
- 209
- 3
- 10
2
votes
1 answer
spark script fails : java.net.ConnectException: Connection refused org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens
I am trying to run a simple spark script on BigInsights on Cloud:
lines = sc.textFile(license_filename, 1)
counts = lines.flatMap(lambda x: x.split(' ')) \
.map(lambda x: (x, 1)) \
.reduceByKey(add) \
…

Chris Snow
- 23,813
- 35
- 144
- 309
2
votes
0 answers
Error from python worker: /usr/bin/python No module named pyspark
I am trying to run Pyspark on Yarn, but I receive the following error, when I type any command on the console.
I am able run scala shell in Spark in both local and yarn mode.
Pyspark runs fine in local mode, but does not work in yarn mode.
OS : RHEL…

akp
- 53
- 9
2
votes
0 answers
Oozie Workflow using Maven
I am trying to create an oozie application using IBM BigInsights. I believe to run the application on IBM BigInsights, the minimum folder structure should be:
BiApp
—> application
—> application.xml
—> workflow
—> lib
—> jar…

KKa
- 408
- 4
- 19
2
votes
1 answer
How to programmatically read schema from header file in jaql?
I am trying to achieve the following in JAQL and am stuck.
I have two files: File data.tsv, which contains tab separated data, and a file header.tsv, which contains exactly one line with tab separated values, corresponding to the "header" of file…

Blaubaer
- 654
- 1
- 5
- 15