Eskapade contains many tools, and to find and use them most efficiently it is necessary to understand how the repository is build up. This section discusses the structure of the code and how the framework handles subpackages.
The architecture of Eskapade can be summarized in this picture:
The example we just discuscced generally shows how the framework works. The steps it takes are the following:
- run_eskapade.py runs the macro file,
- Macros (python file) contain Chains,
- Chains (python object) contains Links,
- Links (python class) contain analysis code.
The chains are run in the order of 'registering' them in the
ProcessManager is the ultimate object that executes all the code in your macro.
It also keeps track of the configuration of Eskapade, and of the objects in the
data store that are passable between links.
The components of the architecture of Eskapade are explained in further detail in the Tutorials section.
When using Eskapade it is important to understand where all components are located. The components can be for example links or utilities that you want to use.
The Eskapade framework is contained in the Python package
which lives in the
python directory. Every specific subject has its
eskapade, containing the utilities it needs, the links
that are defined for the subject, and the corresponding tests.
The core of the framework is implemented in the
This subpackage contains the low-level machinery for running analysis
algorithms in chains of links. The
core_ops subpackage contains
basic links for operating this framework.
An example of a typical subpackage is
contains basic analysis tools. Its structure is common to all Eskapade
|-eskapade |-analysis |-links |-tests |-integration
The subpackage contains several modules, which contain classes and
functions to be applied in links. The
module, for example, contains code to generate an overview of the
statistical properties of variables in given input data.
Eskapade links are located in the
links directory. There is a
separate module for each link, defining the link class instance. By
convention, the names of the module and class are both the link name,
the former in snake case and the latter in camel case. For example, the
read_to_df defines the link class
Unit tests are defined in modules in the
tests directory. Ideally,
there is a test module for each (link) module in the Eskapade
subpackage. Optionally, integration tests are implemented in
tests/integration. For the
eskapade.analysis package, there is
test_tutorial_macros with integration tests that run the
tutorial macros corresponding to this package.
Eskapade contains the following list of subpackages:
coreis the package that contains the core framework of Eskapade.
core_opscontains links pertaining to the core functionality of Eskapade.
analysiscontains pandas links and code.
visualizationcontains visualization code and plotter links.
root_analysiscontains ROOT links and code for data generation, fitting, and plotting.
data_qualitycontains links and code for fixing messy data.
spark_analysiscontains spark related analysis links and code.
Main elements of the Eskapade framework are imported directly from the
eskapade package. For example, the run-configuration object and the
run-process manager are part of the core subpackage, but are imported by
from eskapade import ConfigObject, ProcessManager
Links are imported directly from their subpackage:
from eskapade.analysis import ReadToDf
In a macro, you can now instantiate and configure the
and add it to a chain in the process manager.
Results of a macro are written out by default in the
directory. The analysis run is persisted in the results directory by the
analysis_name given in the macro. This directory has the following
config: the configuration macro
proc_service_data: persisted states of run-process services
data: analysis results, such as graphs or a trained model
The data for each of these elements are stored by the analysis version,
v2, etc. For example, the report produced by
esk301_dfsummary_plotter is saved in the directory
When building new Links or other functionality you will want to debug at some point. There are multiple ways to do this, because there are multiple ways of running the framework. A few ways are:
- Running in the terminal. In this scenario you have to work in a virtual environment (or adjust your own until it has all dependencies) and debug using the terminal output.
- Running in a notebook. This way the code is run in a notebook and you can gather the output from the browser.
- Running in a docker. The code is run in the docker and the repository is mounted into the container. The docker (terminal) returns output.
- Running in a VM. In this case you run the code in the VM and mount the code into the VM. The output can be gathered in the VM and processed in the VM.
In the first three options you want to use an IDE or text-editor in a 'normal' environment to debug your code and in the last option you can use an editor in the VM or outside of it.
One of the easiest mistakes to make when running the framework is not sourcing the right files or opening a new terminal without setting the right environment. Be careful of this, if you want to run eskapade you have to:
- Source the right virtual environment
- Source the eskapade repository
- Start your notebook / start your IDE / run the code
The least error prone ways are docker and VMs, because they automatically have the right environment variables set.