The Toolkit requires Java 11.
For macOS: You can find information on Java here. We recommend OpenJDK. The easiest way is to install with homebrew and then:
brew cask install adoptopenjdk/openjdk/adoptopenjdk11
If you run into difficulties with homebrew, installation instructions can be found here.
On Debian based system you can install Java using
apt install openjdk-11-jdk
spark-shell can launch,
JAVA_HOME must be set. If you receive an
JAVA_HOME is not set, you need to point it to where Java is
installed. On Linux, this might be
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64 or on macOS it might be
The Toolkit requires Python 3.7.3+
If you would like to use the Archives Unleashed Toolkit with PySpark and
Jupyter Notebooks, you'll need to have a modern version of Python installed.
We recommend using the
This should install Jupyter Notebook, as well as the PySpark bindings. If
it doesn't, you can install either with
conda install or
The Toolkit requires Apache Spark 3.0.0+
Download and unzip Apache Spark to a location of your choice.
curl -L "https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop2.7.tgz" > spark-3.0.0-bin-hadoop2.7.tgz tar -xvf spark-3.0.0-bin-hadoop2.7.tgz