Questions tagged [mapr]

MapR is a commercial data platform that offers a HDFS compatible distributed file system, a database that allows to store data in BigTable or JSON and a streaming platform for messaging. MapR leverages APIs from open source tools such as Hadoop, Kafka, HBase and provides a proprietary implementation written in C optimised for improved performance.

MapR is a complete enterprise-grade distribution for Apache Hadoop. The MapR Converged Data Platform has been engineered to improve Hadoop’s reliability, performance, and ease of use.

The MapR distribution provides a full Hadoop stack that includes the MapR File System (MapR-FS), the MapR-DB NoSQL database management system, MapR Streams, the MapR Control System (MCS) user interface, and a full family of Hadoop ecosystem projects. You can use MapR with Apache Hadoop, HDFS, and MapReduce APIs.

MapR supports the Hadoop 2.x architecture and YARN (Yet Another Resource Negotiator). Hadoop 2.x and YARN make up a resource management and scheduling framework that distributes resource management and job management duties.

There are three MapR editions.

MapR Community Edition (formerly M3)
- Free community edition.
MapR Enterprise Edition (formerly M5)
- Adds high availability and data protection, including multi-node NFS.
MapR Enterprise Database Edition (formerly M7)
- Adds structured table data natively in the storage layer and provides a flexible NoSQL database.

MapR can be installed on many versions of Red Hat Enterprise linux, CentOS, Ubuntu, Oracle Linux, and SUSE. A full matrix of supported Linux operating systems can be found here.

To install MapR the following requirements are needed.

A 64-bit CPU.
One of the above mentioned operating systems. (Red Hat Enterprise linux, CentOS, Ubuntu, Oracle Linux, or SUSE)
A minimum of 8GB of RAM.
At least one single unformatted disk.
A Resolvable hostname.
A common user on each server you wish to install MapR on.
Java 1.7.0 or higher.
Other
- NTP, Syslog, PAM

Try MapR

Download the MapR Sandbox for VMware or Virtualbox for free.

Install MapR on your own. Check to see if the installer is supported for your OS

You will have to meet the prerequisites for a successful installation

Get the mapr-setup sctipt from the MapR repository.

wget http://package.mapr.com/releases/installer/mapr-setup.sh

Run the mapr-setup script to start the installation.

bash ./mapr-setup.sh -y

Open the web UI with the following URL

https://<Installer node hostname/IPaddress>:9443

Following the prompts and you will be on your way to installing MapR.

There is also manual installation available. Full instructions can be viewed here.

Extensive documentation can be found on MapR's documentation site. http://maprdocs.mapr.com/home/

The Stackoverflow tag [mapr] can be used for questions about issues you have with the MapR platform.

381 questions

votes

5 answers

Find port number where HDFS is listening

I want to access hdfs with fully qualified names such as : hadoop fs -ls hdfs://machine-name:8020/user I could also simply access hdfs with hadoop fs -ls /user However, I am writing test cases that should work on different distributions(HDP,…

asked Oct 06 '14 at 13:05

ernesto

1,899
4
26
39

votes

1 answer

Spark and Hive table schema out of sync after external overwrite

I'm am having issues with the schema for Hive tables being out of sync between Spark and Hive on a Mapr cluster with Spark 2.1.0 and Hive 2.1.1. I need to try to resolve this problem specifically for managed tables, but the issue can be reproduced…

apache-spark hive pyspark mapr

asked Mar 09 '18 at 20:10

hulin003

2,554
2
13
9

votes

2 answers

HBase: Create table with same schema as existing table

I tried searching on the forum, where I can create a new empty hbase table from existing hbase table schema, but not able to find. To be more precise, suppose I have a table with multiple column families and many column qualifier within those…

hadoop hbase mapr hbase-shell

asked Feb 24 '16 at 08:32

Gyanendra Dwivedi

5,511
2
27
53

votes

1 answer

Using partitions (with partitionBy) when writing a delta lake has no effect

When I initially write a delta lake, using partitions (with partitionBy) or not, does not make any difference. Using a repartition on the same column before writing, only changes the number of parquet-files. Making the column to partition explicitly…

apache-spark apache-spark-sql partitioning mapr delta-lake

asked Jan 15 '20 at 08:13

Florian Corzilius

votes

1 answer

Why is querying Parquet files is slower than text files in Hive?

I decided to use Parquet as storage format for hive tables and before I actually implement it in my cluster, I decided to run some tests. Surprisingly, Parquet was slower in my tests as against the general notion that it is faster than plain text…

hadoop hive parquet mapr snappy

asked Sep 02 '15 at 10:25

Rahul

2,354
3
21
30

votes

6 answers

Connecting to remote Mapr Hive via JDBC

This question is similar, but not the same, as Hive JDBC getConnection does not return . Yet this is about a remote connection. Also the metastore is present in the directory in which the hiveserver2 was started. We have a running mapr cluster on a…

java jdbc hadoop hive mapr

asked Dec 05 '13 at 09:19

user152468

3,202
6
27
57

votes

4 answers

What are disadvantages of the Hadoop distribution MapR compared to Cloudera and Hortonworks?

Cloudera and Hortonworks use HDFS, one of the basic concepts of Apache Hadoop. MapR uses its own concept / implementation. Instead of HDFS, you use the native file system directly. You can find a lot of advantages using this approach on the website…

hadoop hdfs cloudera mapr

asked Feb 26 '13 at 01:21

Kai Wähner

5,248
4
35
33

votes

1 answer

Does HBase impose a maximum size per row?

High-Level Question: Does HBase impose a maximum size per row which is common to all distributions (and thus not an artifact of implementation), either in terms of bytes-stored or in terms of number of cells? If so: What is the limit? What is the…

hbase mapr

asked Jun 15 '16 at 19:26

sumitsu

1,481
1
16
33

votes

0 answers

mfs service stopped running and cldb not coming up in MapR Cluster

We have a 3 nodes MapR cluster. All 3 nodes have zookeeper running and the 1st node has CLDB, webserver and ResourceManager apart from zookeeper. The cluster was up and running , however the 1st node went down yesterday post which the CLDB service…

mapr

asked Feb 24 '16 at 03:15

Ashwini

votes

2 answers

Difference between MapR-DB and Hbase

I am bit new in MapR but i am aware about hbase. I was going through one of the video where I found that Mapr-DB is a NoSQL DB in MapR and it similar to Hbase. In addition to this Hbase can also be run on MapR. I am confused between MapR-Db and…

hadoop hbase mapr

asked May 15 '15 at 07:40

Shashi

2,686
7
35
67

votes

2 answers

Standard practices for logging in MapReduce jobs

I'm trying to find the best approach for logging in MapReduce jobs. I'm using slf4j with log4j appender as in my other Java applications, but since MapReduce job runs in a distributed manner across the cluster I don't know where should I set the log…

java hadoop mapreduce hadoop2 mapr

asked Jan 23 '15 at 21:59

Frank

votes

0 answers

Unable to to read columns starting with an '_' underscore in spark

I am using Spark 2.1.0 I have a View which is owned by some other group AND we CANNOT change it as we don't own it. create or replace view testUnderscore AS SELECT lookup_id, source_table, `_c3` AS lookup_type, `_c5` AS transaction_type,…

apache-spark apache-spark-sql cloudera mapr

asked Dec 01 '17 at 23:25

AJm

votes

1 answer

spark Yarn mode how to get applicationId from spark-submit

When I submit spark job using spark-submit with master yarn and deploy-mode cluster, it doesn't print/return any applicationId and once job is completed I have to manually check MapReduce jobHistory or spark HistoryServer to get the job details. My…

hadoop apache-spark mapr spark-submit

asked May 26 '17 at 20:10

Rahul Sharma

5,614
10
57
91

votes

2 answers

Hadoop for Large Image Processing

I have a 50TB set of ~1GB tiff images that I need to run the same algorithm on. Currently, I have the rectification process written in C++ and it works well, however it will take forever to run on all these images consecutively. I understand that an…

hadoop apache-spark mapr bigdata

asked Jun 23 '16 at 14:00

HelloWor1d

votes

2 answers

What is meaning of "Hadoop distribution"

I am new to hadoop. I recently read about basics of Apache Hadoop, Pig, Hive, HBase. Then I came across term "Hadoop distribution" and examples were Cloudera, MAPR, HortonWorks. So what is relation of Apache Hadoop (& its echo-system ) with "Hadoop…

hadoop cloudera software-distribution mapr biginsights

asked Feb 20 '16 at 10:06

Kaushik Lele

6,439
13
50
76

2 3

…

25 26 Next