The following article will detail how create R command scripts for HTCondor. This article will also demonstrate how to configure R repository and install packages to your local directory.
What is R ?
- R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.
1. Installing R packages without root access
$ R
First, you need to designate a directory where you will store the downloaded packages. On my machine, I use the directory /home/kross48/packages/ After creating a package directory, to install a package we use the command:
> install.packages("ggplot2", lib="/home/kross48/packages/")
> library(ggplot2, lib.loc="/home/kross48/packages/")
2 . Installing R packages locally from a tar file.
$ R CMD INSTALL arules_1.1-9.tar.gz --library=/home/kross48/packages
It’s a bit of a pain having to type "/your_packages_directory/" all the time. To avoid this burden, we create a file .Renviron in our home area, and add the line R_LIBS=/data/Rpackages/ to it. This means that whenever you start R,
the directory "/your_packages_directory/" is added to the list of places to look for R packages and so:
> install.packages("ggplot2")
> library(ggplot2)
3. Setting the repository Creating an .Rprofile
Every time you install a R package, you are asked which repository R should use. To set the repository and avoid having to specify this at every package install, simply: create a file .Rprofile in your home area. Add the following piece of code to it: cat(".Rprofile: Setting Cloud repositoryn")
r = getOption("repos") # hard code the cloud repo for CRAN
r["CRAN"] = "https://cloud.r-project.org/"
options(repos = r)
rm(r)
or
local({
r
4. Setting up HTCondor Jobs
Sample R Script :
library("mvtnorm",lib.loc="/home/kross48/packages/")
library("rngWELL",lib.loc="/home/kross48/packages/")
library("randtoolbox",lib.loc="/home/kross48/packages/")
sink('test2.txt')
cat('This is my first R program\n')
sink()
print("success")
Sample command file:
universe = vanilla
getenv = true
executable = /usr/bin/Rscript
arguments = test2.R
log = $(Cluster).log
output = $(Cluster).$(process).out
error = $(cluster).$(Process).error
queue
or Example running shell script inside a job
universe = vanilla
getenv = true
executable = test.sh
log = $(Cluster).log
output = $(Cluster).$(process).out
error = $(cluster).$(Process).error
queue
Sample Bash script:
#!/bin/bash
export R_LIBS=/home/kross48/packages
# run your script
/usr/bin/Rscript test.R