Eskapade
stable
  • Introduction
  • Installation
  • Tutorials
  • Command Line Arguments
  • Package structure
  • Release notes
  • Developing and Contributing
  • References
  • API Documentation
    • Eskapade
      • eskapade package
        • Subpackages
        • Submodules
        • eskapade.entry_points module
        • eskapade.exceptions module
        • eskapade.helpers module
        • eskapade.resources module
        • eskapade.utils module
        • eskapade.version module
        • Module contents
  • Miscellaneous
Eskapade
  • Docs »
  • API Documentation »
  • Eskapade »
  • eskapade package »
  • eskapade.core_ops package »
  • eskapade.core_ops.links package
  • Edit on GitHub

eskapade.core_ops.links package¶

Submodules¶

eskapade.core_ops.links.apply module¶

Project: Eskapade - A python-based package for data analysis.

Class: DsApply

Created: 2018-06-30

Description:

Simple link to execute functions, to which datastore has been passed.

Helps in the development of links.

Authors:
KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands

Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.

class eskapade.core_ops.links.apply.DsApply(**kwargs)¶

Bases: eskapade.core.element.Link

Simple link to execute functions to which datastore is passed.

__init__(**kwargs)¶

Initialize an instance.

Parameters:
  • name (str) – name of link
  • apply (list) – list of functions to execute at execute(), to which datastore is passed
execute()¶

Execute the link.

Returns:status code of execution
Return type:StatusCode

eskapade.core_ops.links.assert_in_ds module¶

Project: Eskapade - A python-based package for data analysis.

Class: AssertInDs

Created: 2016/11/08

Description:
Algorithm that asserts that items exists in the datastore
Authors:
KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands

Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.

class eskapade.core_ops.links.assert_in_ds.AssertInDs(**kwargs)¶

Bases: eskapade.core.element.Link

Asserts that specified item(s) exists in the datastore.

__init__(**kwargs)¶

Initialize link instance.

Store the configuration of link AssertInDs

Parameters:
  • name (str) – name of link
  • keySet (lst) – list of keys to check
execute()¶

Execute the link.

eskapade.core_ops.links.break_link module¶

Project: Eskapade - A python-based package for data analysis.

Class: Break

Created: 2017/02/26

Description:
Algorithm to send break signal to process manager and halt execution
Authors:
KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands

Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.

class eskapade.core_ops.links.break_link.Break(**kwargs)¶

Bases: eskapade.core.element.Link

Halt execution.

Link sends failure signal and halts execution of process manager. Break the execution of the processManager at a specific location by simply adding this link at any location in a chain.

__init__(**kwargs)¶

Initialize link instance.

Parameters:name (str) – name of link
execute()¶

Execute the link.

eskapade.core_ops.links.ds_object_deleter module¶

Project: Eskapade - A python-based package for data analysis.

Class: DsObjectDeleter

Created: 2016/11/08

Description:
Algorithm to delete objects from the datastore.
Authors:
KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands

Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.

class eskapade.core_ops.links.ds_object_deleter.DsObjectDeleter(**kwargs)¶

Bases: eskapade.core.element.Link

Delete objects from the datastore.

Delete objects from the DataStore by the key they are under, or keeps only the data by the specified keys.

__init__(**kwargs)¶

Initialize link instance.

Parameters:
  • name (str) – name of link
  • deletion_keys (list) – keys to clear. Overwrites clear_all to false.
  • deletion_classes (list) – delete object(s) by class type.
  • keep_only (lsst) – keys to keep. Overwrites clear_all to false.
  • clear_all (bool) – clear all key-value pairs in the datastore. Default is true.
execute()¶

Execute the link.

initialize()¶

Initialize the link.

eskapade.core_ops.links.ds_to_ds module¶

Project: Eskapade - A python-based package for data analysis.

Class: DsToDs

Created: 2016/11/08

Description:
Algorithm to move, copy, or remove an object in the datastore.
Authors:
KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands

Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.

class eskapade.core_ops.links.ds_to_ds.DsToDs(**kwargs)¶

Bases: eskapade.core.element.Link

Link to move, copy, or remove an object in the datastore.

__init__(**kwargs)¶

Initialize link instance.

Parameters:
  • name (str) – name of link
  • read_key (str) – key of data to read from data store
  • store_key (str) – key of data to store in data store
  • move (bool) – move read_key item to store_key. Default is true.
  • copy (bool) – if True the read_key key, value pair will not be deleted. Default is false.
  • remove (bool) – if True the item corresponding to read_key key will be deleted. Default is false.
  • columnsToAdd (dict) – if the object is a pandas.DataFrame columns to add to the pandas.DataFrame. key = column name, value = column
execute()¶

Execute the link.

initialize()¶

Initialize the link.

eskapade.core_ops.links.event_looper module¶

Project: Eskapade - A python-based package for data analysis.

Class: EventLooper

Created: 2016/11/08

Description:
EventLooper algorithm processes input lines and reprints them, e.g. to use with map/reduce
Authors:
KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands

Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.

class eskapade.core_ops.links.event_looper.EventLooper(**kwargs)¶

Bases: eskapade.core.element.Link

Event looper algorithm processes input lines and reprints or stores them.

Input lines are taken from sys.stdin, processed, and printed on screen.

__init__(**kwargs)¶

Initialize link instance.

Parameters:
  • name (str) – name of link
  • filename (str) – file name where the strings are located (txt or similar). Default is None. (optional)
  • store_key (str) – key to collect in datastore. If set lines are collected. (optional)
  • line_processor_set (list) – list of functions to apply to input lines. (optional)
  • sort (bool) – if true, sort lines before storage (optional)
  • unique (bool) – if true, keep only unique lines before storage (optional),
  • skip_line_beginning_with (list) – skip line if it starts with any of the list. input is list of strings. Default is [‘#’] (optional)
execute()¶

Process all incoming lines.

No output is printed except for lines that are passed on, such that the output lines can be picked up again by another parser.

finalize()¶

Close open file if present.

initialize()¶

Perform basic checks of configured attributes.

eskapade.core_ops.links.hello_world module¶

Project: Eskapade - A python-based package for data analysis.

Class: HelloWorld

Created: 2017/01/31

Description:
Algorithm to do print Hello {}!
Authors:
KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands

Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.

class eskapade.core_ops.links.hello_world.HelloWorld(**kwargs)¶

Bases: eskapade.core.element.Link

Defines the content of link HelloWorld.

__init__(**kwargs)¶

Store the configuration of link HelloWorld.

Parameters:
  • name (str) – name assigned to the link
  • hello (str) – name to print in Hello World! Defaults to ‘World’
  • repeat (int) – repeat print statement N times. Default is 1
execute()¶

Execute the link.

eskapade.core_ops.links.import_data_store module¶

Project: Eskapade - A python-based package for data analysis.

Class: ImportDataStore

Created: 2018-03-17

Description:
Algorithm to import datastore from external pickle file
Authors:
KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands

Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.

class eskapade.core_ops.links.import_data_store.ImportDataStore(**kwargs)¶

Bases: eskapade.core.element.Link

Link to import datastore from external pickle file.

Import can happen at initialize() or execute(). Default is initialize()

__init__(**kwargs)¶

Initialize an instance of the datastore importer link.

Parameters:
  • name (str) – name of link
  • path (str) – path of the datastore pickle file to import
  • import_at_initialize (bool) – if false, perform datastore import at execute. Default is true, at initialize.
execute()¶

Execute the link.

Returns:status code of execution
Return type:StatusCode
import_and_update_datastore()¶

Import and update the datastore

initialize()¶

Initialize the link.

Returns:status code of execution
Return type:StatusCode

eskapade.core_ops.links.ipython_embed module¶

Project: Eskapade - A python-based package for data analysis.

Class: IPythonEmbed

Created: 2017/02/26

Description:
Link that starts up a python console during execution for debugging.
Authors:
KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands

Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.

class eskapade.core_ops.links.ipython_embed.IPythonEmbed(**kwargs)¶

Bases: eskapade.core.element.Link

Link to start up a python console.

Start up a python console by simply adding this link at any location in a chain. Note: not an ipython console, but regular python console.

__init__(**kwargs)¶

Initialize link instance.

Parameters:name (str) – name of link
execute()¶

Execute the link.

eskapade.core_ops.links.line_printer module¶

Project: Eskapade - A python-based package for data analysis.

Class: LinePrinter

Created: 2017/02/21

Description:
Simple algorithm to pick up lines and reprint them.
Authors:
KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands

Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.

class eskapade.core_ops.links.line_printer.LinePrinter(**kwargs)¶

Bases: eskapade.core.element.Link

LinePrinter picks up lines from the datastore and prints them.

__init__(**kwargs)¶

Set up the configuration of link LinePrinter.

Parameters:
  • name (str) – name of link
  • read_key (str) – key of input data to read from data store
execute()¶

Execute the link.

No output is printed except for lines that are passed on, such that the output lines can be picked up again by another parser.

initialize()¶

Initialize the link.

eskapade.core_ops.links.print_ds module¶

Project: Eskapade - A python-based package for data analysis.

Class: PrintDs

Created: 2016/11/08

Description:
Algorithm to print the content of the datastore.
Authors:
KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands

Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.

class eskapade.core_ops.links.print_ds.PrintDs(**kwargs)¶

Bases: eskapade.core.element.Link

Print the content of the datastore.

__init__(**kwargs)¶

Initialize link instance.

Parameters:
  • name (str) – name of link
  • keys (list) – keys of items to print explicitly.
execute()¶

Execute the link.

Print overview of the datastore in current state.

eskapade.core_ops.links.repeat_chain module¶

Project: Eskapade - A python-based package for data analysis.

Class: RepeatChain

Created: 2016/11/08

Description:
Algorithm that sends “repeat this chain” signal to processManager, until ready.
Authors:
KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands

Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.

class eskapade.core_ops.links.repeat_chain.RepeatChain(**kwargs)¶

Bases: eskapade.core.element.Link

Algorithm that sends signal to processManager to repeat the current chain.

__init__(**kwargs)¶

Link that sends signal to processManager to repeat the current chain.

Sents a RepeatChain deenums.StatusCode signal.

Parameters:
  • name (str) – name of link
  • listen_to (list) – repeat this chain if given key is present in ConfigObject and set to true. E.g. this key is set by readtods link when looping over files.
  • maxcount (int) – repeat this chain until max count has been reacher. Default is -1 (off).
execute()¶

Execute the link.

initialize()¶

Initialize the link.

eskapade.core_ops.links.skip_chain_if_empty module¶

Project: Eskapade - A python-based package for data analysis.

Class: SkipChainIfEmpty

Created: 2016/11/08

Description:
Algorithm to skip to the next Chain if input dataset is empty
Authors:
KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands

Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.

class eskapade.core_ops.links.skip_chain_if_empty.SkipChainIfEmpty(**kwargs)¶

Bases: eskapade.core.element.Link

Sends a SkipChain enums.StatusCode signal when an appointed dataset is empty.

This signal causes that the Processs Manager to step immediately to the next Chain. Input collections can be either mongo collections or dataframes in the datastore.

__init__(**kwargs)¶

Initialize link instance.

Parameters:
  • name (str) – name of link
  • collection_set (list) – datastore keys holding the datasets to be checked. If any of these is empty, the chain is skipped.
  • skip_chain_when_key_not_in_ds (bool) – skip the chain as well if the dataframe is not present in the datastore. When True and if type is ‘pandas.DataFrame’, sends a SkipChain signal if key not in DataStore
  • check_at_initialize (bool) – perform dataset empty is check at initialize. Default is true.
  • check_at_execute (bool) – perform dataset empty is check at initialize. Default is false.
check_collection_set()¶

Check existence of collection in either mongo or datastore, and check that they are not empty.

Collections need to be both present and not empty.

  • For mongo collections a dedicated filter can be applied before doing the count.
  • For pandas dataframes the additional option ‘skip_chain_when_key_not_in_ds’ exists. Meaning, skip the chain as well if the dataframe is not present in the datastore.
execute()¶

Execute the link.

Skip to the next Chain if any of the input dataset is empty.

initialize()¶

Initialize the link.

eskapade.core_ops.links.to_ds_dict module¶

Project: Eskapade - A python-based package for data analysis.

Class: ToDsDict

Created: 2016/11/08

Description:
Algorithm to store one object in the DataStore dict during run time.
Authors:
KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands

Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.

class eskapade.core_ops.links.to_ds_dict.ToDsDict(**kwargs)¶

Bases: eskapade.core.element.Link

Stores one object in the DataStore dict during run time.

__init__(**kwargs)¶

Link to store one external object in the DataStore dict during run time.

Parameters:
  • name (str) – name of link
  • store_key (str) – key of object to store in data store
  • obj – object to store
  • force (bool) – overwrite if already present in datastore. default is false. (optional)
  • at_initialize (bool) – store at initialize of link. Default is false.
  • at_execute (bool) – store at execute of link. Default is true.
  • copydict (bool) – if true and obj is a dict, copy all key value pairs into datastore. Default is false.
do_storage(ds)¶

Perform storage in datastore.

Function makes a distinction been dicts and any other object.

execute()¶

Execute the link.

initialize()¶

Initialize the link.

Module contents¶

class eskapade.core_ops.links.AssertInDs(**kwargs)¶

Bases: eskapade.core.element.Link

Asserts that specified item(s) exists in the datastore.

__init__(**kwargs)¶

Initialize link instance.

Store the configuration of link AssertInDs

Parameters:
  • name (str) – name of link
  • keySet (lst) – list of keys to check
execute()¶

Execute the link.

class eskapade.core_ops.links.Break(**kwargs)¶

Bases: eskapade.core.element.Link

Halt execution.

Link sends failure signal and halts execution of process manager. Break the execution of the processManager at a specific location by simply adding this link at any location in a chain.

__init__(**kwargs)¶

Initialize link instance.

Parameters:name (str) – name of link
execute()¶

Execute the link.

class eskapade.core_ops.links.DsObjectDeleter(**kwargs)¶

Bases: eskapade.core.element.Link

Delete objects from the datastore.

Delete objects from the DataStore by the key they are under, or keeps only the data by the specified keys.

__init__(**kwargs)¶

Initialize link instance.

Parameters:
  • name (str) – name of link
  • deletion_keys (list) – keys to clear. Overwrites clear_all to false.
  • deletion_classes (list) – delete object(s) by class type.
  • keep_only (lsst) – keys to keep. Overwrites clear_all to false.
  • clear_all (bool) – clear all key-value pairs in the datastore. Default is true.
execute()¶

Execute the link.

initialize()¶

Initialize the link.

class eskapade.core_ops.links.DsToDs(**kwargs)¶

Bases: eskapade.core.element.Link

Link to move, copy, or remove an object in the datastore.

__init__(**kwargs)¶

Initialize link instance.

Parameters:
  • name (str) – name of link
  • read_key (str) – key of data to read from data store
  • store_key (str) – key of data to store in data store
  • move (bool) – move read_key item to store_key. Default is true.
  • copy (bool) – if True the read_key key, value pair will not be deleted. Default is false.
  • remove (bool) – if True the item corresponding to read_key key will be deleted. Default is false.
  • columnsToAdd (dict) – if the object is a pandas.DataFrame columns to add to the pandas.DataFrame. key = column name, value = column
execute()¶

Execute the link.

initialize()¶

Initialize the link.

class eskapade.core_ops.links.EventLooper(**kwargs)¶

Bases: eskapade.core.element.Link

Event looper algorithm processes input lines and reprints or stores them.

Input lines are taken from sys.stdin, processed, and printed on screen.

__init__(**kwargs)¶

Initialize link instance.

Parameters:
  • name (str) – name of link
  • filename (str) – file name where the strings are located (txt or similar). Default is None. (optional)
  • store_key (str) – key to collect in datastore. If set lines are collected. (optional)
  • line_processor_set (list) – list of functions to apply to input lines. (optional)
  • sort (bool) – if true, sort lines before storage (optional)
  • unique (bool) – if true, keep only unique lines before storage (optional),
  • skip_line_beginning_with (list) – skip line if it starts with any of the list. input is list of strings. Default is [‘#’] (optional)
execute()¶

Process all incoming lines.

No output is printed except for lines that are passed on, such that the output lines can be picked up again by another parser.

finalize()¶

Close open file if present.

initialize()¶

Perform basic checks of configured attributes.

class eskapade.core_ops.links.HelloWorld(**kwargs)¶

Bases: eskapade.core.element.Link

Defines the content of link HelloWorld.

__init__(**kwargs)¶

Store the configuration of link HelloWorld.

Parameters:
  • name (str) – name assigned to the link
  • hello (str) – name to print in Hello World! Defaults to ‘World’
  • repeat (int) – repeat print statement N times. Default is 1
execute()¶

Execute the link.

class eskapade.core_ops.links.IPythonEmbed(**kwargs)¶

Bases: eskapade.core.element.Link

Link to start up a python console.

Start up a python console by simply adding this link at any location in a chain. Note: not an ipython console, but regular python console.

__init__(**kwargs)¶

Initialize link instance.

Parameters:name (str) – name of link
execute()¶

Execute the link.

class eskapade.core_ops.links.LinePrinter(**kwargs)¶

Bases: eskapade.core.element.Link

LinePrinter picks up lines from the datastore and prints them.

__init__(**kwargs)¶

Set up the configuration of link LinePrinter.

Parameters:
  • name (str) – name of link
  • read_key (str) – key of input data to read from data store
execute()¶

Execute the link.

No output is printed except for lines that are passed on, such that the output lines can be picked up again by another parser.

initialize()¶

Initialize the link.

class eskapade.core_ops.links.PrintDs(**kwargs)¶

Bases: eskapade.core.element.Link

Print the content of the datastore.

__init__(**kwargs)¶

Initialize link instance.

Parameters:
  • name (str) – name of link
  • keys (list) – keys of items to print explicitly.
execute()¶

Execute the link.

Print overview of the datastore in current state.

class eskapade.core_ops.links.RepeatChain(**kwargs)¶

Bases: eskapade.core.element.Link

Algorithm that sends signal to processManager to repeat the current chain.

__init__(**kwargs)¶

Link that sends signal to processManager to repeat the current chain.

Sents a RepeatChain deenums.StatusCode signal.

Parameters:
  • name (str) – name of link
  • listen_to (list) – repeat this chain if given key is present in ConfigObject and set to true. E.g. this key is set by readtods link when looping over files.
  • maxcount (int) – repeat this chain until max count has been reacher. Default is -1 (off).
execute()¶

Execute the link.

initialize()¶

Initialize the link.

class eskapade.core_ops.links.SkipChainIfEmpty(**kwargs)¶

Bases: eskapade.core.element.Link

Sends a SkipChain enums.StatusCode signal when an appointed dataset is empty.

This signal causes that the Processs Manager to step immediately to the next Chain. Input collections can be either mongo collections or dataframes in the datastore.

__init__(**kwargs)¶

Initialize link instance.

Parameters:
  • name (str) – name of link
  • collection_set (list) – datastore keys holding the datasets to be checked. If any of these is empty, the chain is skipped.
  • skip_chain_when_key_not_in_ds (bool) – skip the chain as well if the dataframe is not present in the datastore. When True and if type is ‘pandas.DataFrame’, sends a SkipChain signal if key not in DataStore
  • check_at_initialize (bool) – perform dataset empty is check at initialize. Default is true.
  • check_at_execute (bool) – perform dataset empty is check at initialize. Default is false.
check_collection_set()¶

Check existence of collection in either mongo or datastore, and check that they are not empty.

Collections need to be both present and not empty.

  • For mongo collections a dedicated filter can be applied before doing the count.
  • For pandas dataframes the additional option ‘skip_chain_when_key_not_in_ds’ exists. Meaning, skip the chain as well if the dataframe is not present in the datastore.
execute()¶

Execute the link.

Skip to the next Chain if any of the input dataset is empty.

initialize()¶

Initialize the link.

class eskapade.core_ops.links.ToDsDict(**kwargs)¶

Bases: eskapade.core.element.Link

Stores one object in the DataStore dict during run time.

__init__(**kwargs)¶

Link to store one external object in the DataStore dict during run time.

Parameters:
  • name (str) – name of link
  • store_key (str) – key of object to store in data store
  • obj – object to store
  • force (bool) – overwrite if already present in datastore. default is false. (optional)
  • at_initialize (bool) – store at initialize of link. Default is false.
  • at_execute (bool) – store at execute of link. Default is true.
  • copydict (bool) – if true and obj is a dict, copy all key value pairs into datastore. Default is false.
do_storage(ds)¶

Perform storage in datastore.

Function makes a distinction been dicts and any other object.

execute()¶

Execute the link.

initialize()¶

Initialize the link.

class eskapade.core_ops.links.DsApply(**kwargs)¶

Bases: eskapade.core.element.Link

Simple link to execute functions to which datastore is passed.

__init__(**kwargs)¶

Initialize an instance.

Parameters:
  • name (str) – name of link
  • apply (list) – list of functions to execute at execute(), to which datastore is passed
execute()¶

Execute the link.

Returns:status code of execution
Return type:StatusCode
class eskapade.core_ops.links.ImportDataStore(**kwargs)¶

Bases: eskapade.core.element.Link

Link to import datastore from external pickle file.

Import can happen at initialize() or execute(). Default is initialize()

__init__(**kwargs)¶

Initialize an instance of the datastore importer link.

Parameters:
  • name (str) – name of link
  • path (str) – path of the datastore pickle file to import
  • import_at_initialize (bool) – if false, perform datastore import at execute. Default is true, at initialize.
execute()¶

Execute the link.

Returns:status code of execution
Return type:StatusCode
import_and_update_datastore()¶

Import and update the datastore

initialize()¶

Initialize the link.

Returns:status code of execution
Return type:StatusCode
Next Previous

© Copyright 2018, KPMG Advisory N.V. Revision fcadefad.

Built with Sphinx using a theme provided by Read the Docs.