Questions tagged [mapr]

MapR is a commercial data platform that offers a HDFS compatible distributed file system, a database that allows to store data in BigTable or JSON and a streaming platform for messaging. MapR leverages APIs from open source tools such as Hadoop, Kafka, HBase and provides a proprietary implementation written in C optimised for improved performance.

MapR is a complete enterprise-grade distribution for Apache Hadoop. The MapR Converged Data Platform has been engineered to improve Hadoop’s reliability, performance, and ease of use.

The MapR distribution provides a full Hadoop stack that includes the MapR File System (MapR-FS), the MapR-DB NoSQL database management system, MapR Streams, the MapR Control System (MCS) user interface, and a full family of Hadoop ecosystem projects. You can use MapR with Apache Hadoop, HDFS, and MapReduce APIs.

MapR supports the Hadoop 2.x architecture and YARN (Yet Another Resource Negotiator). Hadoop 2.x and YARN make up a resource management and scheduling framework that distributes resource management and job management duties.

enter image description here

There are three MapR editions.

  • MapR Community Edition (formerly M3)
    • Free community edition.
  • MapR Enterprise Edition (formerly M5)
    • Adds high availability and data protection, including multi-node NFS.
  • MapR Enterprise Database Edition (formerly M7)
    • Adds structured table data natively in the storage layer and provides a flexible NoSQL database.

MapR can be installed on many versions of Red Hat Enterprise linux, CentOS, Ubuntu, Oracle Linux, and SUSE. A full matrix of supported Linux operating systems can be found here.

To install MapR the following requirements are needed.

  • A 64-bit CPU.
  • One of the above mentioned operating systems. (Red Hat Enterprise linux, CentOS, Ubuntu, Oracle Linux, or SUSE)
  • A minimum of 8GB of RAM.
  • At least one single unformatted disk.
  • A Resolvable hostname.
  • A common user on each server you wish to install MapR on.
  • Java 1.7.0 or higher.
  • Other
    • NTP, Syslog, PAM



Try MapR

Download the MapR Sandbox for VMware or Virtualbox for free.

OR

Install MapR on your own. Check to see if the installer is supported for your OS

You will have to meet the prerequisites for a successful installation

Get the mapr-setup sctipt from the MapR repository.

wget http://package.mapr.com/releases/installer/mapr-setup.sh

Run the mapr-setup script to start the installation.

bash ./mapr-setup.sh -y

Open the web UI with the following URL

https://<Installer node hostname/IPaddress>:9443

Following the prompts and you will be on your way to installing MapR.

There is also manual installation available. Full instructions can be viewed here.

Extensive documentation can be found on MapR's documentation site. http://maprdocs.mapr.com/home/



The Stackoverflow tag [mapr] can be used for questions about issues you have with the MapR platform.

381 questions
32
votes
5 answers

Find port number where HDFS is listening

I want to access hdfs with fully qualified names such as : hadoop fs -ls hdfs://machine-name:8020/user I could also simply access hdfs with hadoop fs -ls /user However, I am writing test cases that should work on different distributions(HDP,…
ernesto
  • 1,899
  • 4
  • 26
  • 39
11
votes
1 answer

Spark and Hive table schema out of sync after external overwrite

I'm am having issues with the schema for Hive tables being out of sync between Spark and Hive on a Mapr cluster with Spark 2.1.0 and Hive 2.1.1. I need to try to resolve this problem specifically for managed tables, but the issue can be reproduced…
hulin003
  • 2,554
  • 2
  • 13
  • 9
9
votes
2 answers

HBase: Create table with same schema as existing table

I tried searching on the forum, where I can create a new empty hbase table from existing hbase table schema, but not able to find. To be more precise, suppose I have a table with multiple column families and many column qualifier within those…
Gyanendra Dwivedi
  • 5,511
  • 2
  • 27
  • 53
7
votes
1 answer

Using partitions (with partitionBy) when writing a delta lake has no effect

When I initially write a delta lake, using partitions (with partitionBy) or not, does not make any difference. Using a repartition on the same column before writing, only changes the number of parquet-files. Making the column to partition explicitly…
6
votes
1 answer

Why is querying Parquet files is slower than text files in Hive?

I decided to use Parquet as storage format for hive tables and before I actually implement it in my cluster, I decided to run some tests. Surprisingly, Parquet was slower in my tests as against the general notion that it is faster than plain text…
Rahul
  • 2,354
  • 3
  • 21
  • 30
6
votes
6 answers

Connecting to remote Mapr Hive via JDBC

This question is similar, but not the same, as Hive JDBC getConnection does not return . Yet this is about a remote connection. Also the metastore is present in the directory in which the hiveserver2 was started. We have a running mapr cluster on a…
user152468
  • 3,202
  • 6
  • 27
  • 57
6
votes
4 answers

What are disadvantages of the Hadoop distribution MapR compared to Cloudera and Hortonworks?

Cloudera and Hortonworks use HDFS, one of the basic concepts of Apache Hadoop. MapR uses its own concept / implementation. Instead of HDFS, you use the native file system directly. You can find a lot of advantages using this approach on the website…
Kai Wähner
  • 5,248
  • 4
  • 35
  • 33
5
votes
1 answer

Does HBase impose a maximum size per row?

High-Level Question: Does HBase impose a maximum size per row which is common to all distributions (and thus not an artifact of implementation), either in terms of bytes-stored or in terms of number of cells? If so: What is the limit? What is the…
sumitsu
  • 1,481
  • 1
  • 16
  • 33
5
votes
0 answers

mfs service stopped running and cldb not coming up in MapR Cluster

We have a 3 nodes MapR cluster. All 3 nodes have zookeeper running and the 1st node has CLDB, webserver and ResourceManager apart from zookeeper. The cluster was up and running , however the 1st node went down yesterday post which the CLDB service…
Ashwini
  • 51
  • 3
5
votes
2 answers

Difference between MapR-DB and Hbase

I am bit new in MapR but i am aware about hbase. I was going through one of the video where I found that Mapr-DB is a NoSQL DB in MapR and it similar to Hbase. In addition to this Hbase can also be run on MapR. I am confused between MapR-Db and…
Shashi
  • 2,686
  • 7
  • 35
  • 67
5
votes
2 answers

Standard practices for logging in MapReduce jobs

I'm trying to find the best approach for logging in MapReduce jobs. I'm using slf4j with log4j appender as in my other Java applications, but since MapReduce job runs in a distributed manner across the cluster I don't know where should I set the log…
Frank
  • 45
  • 1
  • 1
  • 9
4
votes
0 answers

Unable to to read columns starting with an '_' underscore in spark

I am using Spark 2.1.0 I have a View which is owned by some other group AND we CANNOT change it as we don't own it. create or replace view testUnderscore AS SELECT lookup_id, source_table, `_c3` AS lookup_type, `_c5` AS transaction_type,…
AJm
  • 993
  • 2
  • 20
  • 39
4
votes
1 answer

spark Yarn mode how to get applicationId from spark-submit

When I submit spark job using spark-submit with master yarn and deploy-mode cluster, it doesn't print/return any applicationId and once job is completed I have to manually check MapReduce jobHistory or spark HistoryServer to get the job details. My…
Rahul Sharma
  • 5,614
  • 10
  • 57
  • 91
4
votes
2 answers

Hadoop for Large Image Processing

I have a 50TB set of ~1GB tiff images that I need to run the same algorithm on. Currently, I have the rectification process written in C++ and it works well, however it will take forever to run on all these images consecutively. I understand that an…
HelloWor1d
  • 63
  • 1
  • 2
  • 10
4
votes
2 answers

What is meaning of "Hadoop distribution"

I am new to hadoop. I recently read about basics of Apache Hadoop, Pig, Hive, HBase. Then I came across term "Hadoop distribution" and examples were Cloudera, MAPR, HortonWorks. So what is relation of Apache Hadoop (& its echo-system ) with "Hadoop…
Kaushik Lele
  • 6,439
  • 13
  • 50
  • 76
1
2 3
25 26