Install Oracle NoSQL Enterprise

The easiest way to learn Oracle NoSQL is to download the Oracle Big Data Lite VirtualBox image (here). That image contains most components used in Oracle Big Data technology stack including Cloudera Manager. There are quite a few steps, and…

Install MySQL community edition

Linux Version: Oracle Linux 6.8 MySQL Version: 5.7 community edition This is practice is to install MySQL for some other Spark practices. I have a separate database server that runs Oracle Linux 6.8. And on the server I have Oracle…

Install and configure ZooKeeper

ZooKeeper version: 3.4.9 (http://zookeeper.apache.org) Last time when we installed HBase, we configured the packaged ZooKeeper. This practice is to install a separate ZooKeeper so that it can work with other distributed applications, like Kafka. ZooKeeper is a cluster management application.…

AKKA for parallel processing

Software Packages AKKA version: 2.14.7 SBT version: 0.13.13 Scala version: 2.11.8 While Spark is designed for distributed data analysis, AKKA is said best for distributed transaction processing. AKKA, Play, and Scala are showing great momentum in the Scala ecosystem. This…

Calling Spark in R

Spark version: 2.1.0 R version: 3.2.5 R Studio version: 1.0.136 Spark supports R through sparkR package. There are two ways to invoke a sparkR. One is to run sparkR shell command. sparkR will start an spark context and initiate a…

Install and configure R

R version: 3.2.5 R Studio version: 1.0.136 Linux version: Oracle Linux 7.3 R is statistical package. The role of R in the whole big data technology stack is largely in predictive analysis. Open source R is a single threaded desktop…

Work with Hive in Spark

Spark version: 2.1.0 Hive version: 2.1.1 Hadoop version: 2.7.3 For data exchange or integration with other applications, Spark can read from or write to Hive tables. Here are the configurations to make that work. Hive is a database technology using…