The Archives Unleashed Toolkit requires Java 8.
For macOS: You can find information on Java here. We recommend OpenJDK. The easiest way is to install with homebrew and then:
brew cask install adoptopenjdk/openjdk/adoptopenjdk8
If you run into difficulties with homebrew, installation instructions can be found here.
On Debian based system you can install Java using apt
apt install openjdk-8-jdk
Before spark-shell
can launch, JAVA_HOME
must be set. If you receive an
error that JAVA_HOME
is not set, you need to point it to where Java is
installed. On Linux, this might be
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
or on macOS it might be
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_74.jdk/Contents/Home
If you would like to use the Archives Unleashed Toolkit with PySpark and
Jupyter Notebooks, you'll need to have a modern version of Python installed.
We recommend using the
Anaconda Distribution.
This should install Jupyter Notebook, as well as the PySpark bindings. If
it doesn't, you can install either with conda install
or pip install
Apache Spark
Download and unzip Apache Spark to a location of your choice.
curl -L "" > spark-2.4.5-bin-hadoop2.7.tgz
tar -xvf spark-2.4.5-bin-hadoop2.7.tgz