Install and configure R

R version: 3.2.5
R Studio version: 1.0.136
Linux version: Oracle Linux 7.3

R is statistical package. The role of R in the whole big data technology stack is largely in predictive analysis. Open source R is a single threaded desktop application at this moment. That is huge limitation for it to be used in production Big Data environments. Its main use is for data scientists to do data analysis and visualization. Oracle has a proprietary version of R called ORE (Oracle R Enterprise) which supports client-server infrastructure and multi-threading. That makes it a lot of more powerful. Here I will show how to install open source R and RStudio. And then a little bit how to install Oracle R.

Step 1: Download and install open source R

R code can be downloaded from here. If you are using Oracle Linux 7 as I do, it is better to download a lower version. I picked 3.2.5. Version 3.3.x has some issue regarding to compression libraries. Basically lzma has been deprecated in Linux world and replaced by xz. But R version 3.3.x still uses lzma libraries. If you are using an older version Linux like RedHat 6 or Oracle Linux 6, you can probably go with R 3.3.x.

In my case, I will install R on nnode1. Once downloaded, run following to unpack it.

sudo mv R-3.2.5.tar.gz /opt/app/
cd /opt/app
sudo tar xvfz R-3.2.5.tar.gz
sudo chown -R hadoop:dev R-3.2.5
sudo rm R-3.2.5.tar.gz

To install R, you need to configure and then build the binaries. The official instructions are not quite precise. And here is a working version on Oracle Linux 7.3.

cd /opt/app/R-3.2.5
./configure --with-readline=no --with-x=no --with-lapack --with-ICU=no --enable-R-shlib
./make
./make install

That will install R under /usr/lib64/R. Execute R to verify.

Step 2: Download and install R Studio

R Studio makes data analysis much more convenient. A desktop version is free and can be downloaded here. You can get the version for Fedora. It will run on Oracle Linux, as Oracle Linux is branched from RedHat. I think it is better to get the tarball instead of the installer version since I will install all application under /opt/app, not /usr/lib.

sudo mv rstudio-1.0.136-x86_64-fedora.tar.gz /opt/app
cd /opt/app
sudo tar xvfz rstudio-1.0.136-x86_64-fedora.tar.gz
sudo mv rstudio-1.0.136-x86_64-fedora rstudio-1.0.136
sudo chown -R hadoop:dev rstudio-1.0.136

Then you add following to .bashrc or .bash_profile.

export RSTUDIO_HOME=/opt/app/rstudio-1.0.136
export PATH=$PATH:${RSTUDIO_HOME}/bin

R Studio doesn’t include R. So if you prior step is not successful, then you will not be able to run rstudio. Once above configuration is done, you can run rstudio, and it will popup an application like this.

The box at lower left lets you enter commands and the box on top left lets you run scripts. The right side will show results or help information.

Step 3: Install Oracle R

This is for future practices. If you would like to explore how Oracle R can interacts with other Oracle products like OBIEE or Oracle Advanced Analytics, you will need to install Oracle distribution of R.

If you don’t have the Oracle public YUM repository on your sever, you will need to down load it. You can jump to next step if the file is present.

cd /etc/yum.repos.d
sudo wget http://public-yum.oracle.com/public-yum-ol7.repo

Next, you will need to change the file to enable add-ons.

[xxx_latest]
enabled=1
[xxx_addons]
enabled=1
[ol7_optional_latest]
enabled = 1 #for Oracle Linux 7 only:

Save the file, and execute yum to install R.x86_64. The Oracle distribution might be slightly lower version than the most recent open source R release.

sudo yum install R