Eskapade
stable
  • Introduction
  • Installation
  • Tutorials
  • Command Line Arguments
  • Package structure
  • Release notes
  • Developing and Contributing
  • References
  • API Documentation
    • Eskapade
      • eskapade package
        • Subpackages
        • Submodules
        • eskapade.entry_points module
        • eskapade.exceptions module
        • eskapade.helpers module
        • eskapade.resources module
        • eskapade.utils module
        • eskapade.version module
        • Module contents
  • Miscellaneous
Eskapade
  • Docs »
  • API Documentation »
  • Eskapade »
  • eskapade package »
  • eskapade.visualization package »
  • eskapade.visualization.links package
  • Edit on GitHub

eskapade.visualization.links package¶

Submodules¶

eskapade.visualization.links.correlation_summary module¶

Project: Eskapade - A python-based package for data analysis.

Class : correlation_summary

Created: 2017/03/13

Description:
Algorithm to do create correlation heatmaps.
Authors:
KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands

Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.

class eskapade.visualization.links.correlation_summary.CorrelationSummary(**kwargs)¶

Bases: eskapade.core.element.Link

Create a heatmap of correlations between dataframe variables.

__init__(**kwargs)¶

Initialize link instance.

Parameters:
  • name (str) – name of link
  • read_key (str) – key of input dataframe to read from data store
  • store_key (str) – key of correlations dataframe in data store
  • results_path (str) – path to save correlation summary pdf
  • methods (list) – method(s) of computing correlations
  • pages_key (str) – data store key of existing report pages
execute()¶

Execute the link.

finalize()¶

Finalize the link.

initialize()¶

Initialize the link.

eskapade.visualization.links.df_boxplot module¶

Project: Eskapade - A python-based package for data analysis.

Class : DfBoxplot

Created: 2017/02/17

Description:
Link to create a boxplot of data frame columns.
Authors:
KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands

Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.

class eskapade.visualization.links.df_boxplot.DfBoxplot(**kwargs)¶

Bases: eskapade.core.element.Link

Create a boxplot of one column of a DataFrame that is grouped by values from a second column.

Creates a report page for each variable in DataFrame, containing:

  • a profile of the column dataset
  • a nicely scaled plot of the boxplots per group of the column

Example is available in: tutorials/esk304_df_boxplot.py

__init__(**kwargs)¶

Initialize link instance.

Parameters:
  • name (str) – name of link
  • read_key (str) – key of input data to read from data store
  • results_path (str) – output path of summary result files
  • column (str) – column pick up from input data to use as boxplot input
  • cause_columns (list) – list of columns (str) to group-by, and per unique value plot a boxplot
  • statistics (list) – a list of strings of the statistics you want to generate for the boxplot the full list is taken from statistics.ArrayStats.get_latex_table defaults to: [‘count’, ‘mean’, ‘min’, ‘max’]
  • pages_key (str) – data store key of existing report pages
execute()¶

Execute the link.

Creates a report page for each column that we group-by in the data frame.

  • create statistics object for group
  • create overview table of column variable
  • plot boxplot of column variable per group
  • store plot
finalize()¶

Finalize the link.

initialize()¶

Initialize the link.

eskapade.visualization.links.df_summary module¶

Project: Eskapade - A python-based package for data analysis.

Class : DfSummary

Created: 2017/02/17

Description:
Link to create a statistics summary of data frame columns or of a set of histograms.
Authors:
KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands

Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.

class eskapade.visualization.links.df_summary.DfSummary(**kwargs)¶

Bases: eskapade.core.element.Link

Create a summary of a dataframe.

Creates a report page for each variable in data frame, containing:

  • a profile of the column dataset
  • a nicely scaled plot of the column dataset

Example 1 is available in: tutorials/esk301_dfsummary_plotter.py

Example 2 is available in: tutorials/esk303_histogram_filling_plotting.py Empty histograms are automatically skipped from processing.

__init__(**kwargs)¶

Initialize link instance.

Parameters:
  • name (str) – name of link
  • read_key (str) – key of input dataframe (or histogram-dict) to read from data store
  • results_path (str) – output path of summary result files
  • columns (list) – columns (or histogram keys) pick up from input data to make & plot summaries for
  • hist_keys (list) – alternative to columns (optional)
  • var_labels (dict) – dict of column names with a label per column
  • var_units (dict) – dict of column names with a unit per column
  • var_bins (dict) – dict of column names with the number of bins per column. Default per column is 30.
  • hist_y_label (str) – y-axis label to plot for all columns. Default is ‘Bin Counts’.
  • pages_key (str) – data store key of existing report pages
assert_data_type(data)¶

Check type of input data.

Parameters:data – input data sample (pandas dataframe or dict)
execute()¶

Execute the link.

Creates a report page for each variable in data frame.

  • create statistics object for column
  • create overview table of column variable
  • plot histogram of column variable
  • store plot
Returns:execution status code
Return type:StatusCode
finalize()¶

Finalize the link.

get_all_columns(data)¶

Retrieve all columns / keys from input data.

Parameters:data – input data sample (pandas dataframe or dict)
Returns:list of columns
Return type:list
get_length(data)¶

Get length of data set.

Parameters:data – input data (pandas dataframe or dict)
Returns:length of data set
get_sample(data, key)¶

Retrieve speficic column or item from input data.

Parameters:
  • data – input data (pandas dataframe or dict)
  • key (str) – column key
Returns:

data series or item

initialize()¶

Initialize the link.

process_1d_histogram(name, hist)¶

Create statistics of and plot input 1d histogram.

Parameters:
  • name (str) – name of the histogram
  • hist – input histogram object
process_2d_histogram(name, hist)¶

Create statistics of and plot input 2d histogram.

Parameters:
  • name (str) – name of the histogram
  • hist – input histogram object
process_nan_histogram(nphist, n_data)¶

Process nans histogram.

Add nans histogram to pdf list

Parameters:
  • nphist – numpy-style input histogram, consisting of comma-separaged bin_entries, bin_edges
  • n_data (int) – number of entries in the processed data set
process_sample(name, sample)¶

Process various possible data samples.

Parameters:
  • name (str) – name of sample
  • sample – input pandas series object or histogram
process_series(col, sample)¶

Create statistics of and plot input pandas series.

Parameters:
  • col (str) – name of the series
  • sample – input pandas series object

Module contents¶

class eskapade.visualization.links.CorrelationSummary(**kwargs)¶

Bases: eskapade.core.element.Link

Create a heatmap of correlations between dataframe variables.

__init__(**kwargs)¶

Initialize link instance.

Parameters:
  • name (str) – name of link
  • read_key (str) – key of input dataframe to read from data store
  • store_key (str) – key of correlations dataframe in data store
  • results_path (str) – path to save correlation summary pdf
  • methods (list) – method(s) of computing correlations
  • pages_key (str) – data store key of existing report pages
execute()¶

Execute the link.

finalize()¶

Finalize the link.

initialize()¶

Initialize the link.

class eskapade.visualization.links.DfBoxplot(**kwargs)¶

Bases: eskapade.core.element.Link

Create a boxplot of one column of a DataFrame that is grouped by values from a second column.

Creates a report page for each variable in DataFrame, containing:

  • a profile of the column dataset
  • a nicely scaled plot of the boxplots per group of the column

Example is available in: tutorials/esk304_df_boxplot.py

__init__(**kwargs)¶

Initialize link instance.

Parameters:
  • name (str) – name of link
  • read_key (str) – key of input data to read from data store
  • results_path (str) – output path of summary result files
  • column (str) – column pick up from input data to use as boxplot input
  • cause_columns (list) – list of columns (str) to group-by, and per unique value plot a boxplot
  • statistics (list) – a list of strings of the statistics you want to generate for the boxplot the full list is taken from statistics.ArrayStats.get_latex_table defaults to: [‘count’, ‘mean’, ‘min’, ‘max’]
  • pages_key (str) – data store key of existing report pages
execute()¶

Execute the link.

Creates a report page for each column that we group-by in the data frame.

  • create statistics object for group
  • create overview table of column variable
  • plot boxplot of column variable per group
  • store plot
finalize()¶

Finalize the link.

initialize()¶

Initialize the link.

class eskapade.visualization.links.DfSummary(**kwargs)¶

Bases: eskapade.core.element.Link

Create a summary of a dataframe.

Creates a report page for each variable in data frame, containing:

  • a profile of the column dataset
  • a nicely scaled plot of the column dataset

Example 1 is available in: tutorials/esk301_dfsummary_plotter.py

Example 2 is available in: tutorials/esk303_histogram_filling_plotting.py Empty histograms are automatically skipped from processing.

__init__(**kwargs)¶

Initialize link instance.

Parameters:
  • name (str) – name of link
  • read_key (str) – key of input dataframe (or histogram-dict) to read from data store
  • results_path (str) – output path of summary result files
  • columns (list) – columns (or histogram keys) pick up from input data to make & plot summaries for
  • hist_keys (list) – alternative to columns (optional)
  • var_labels (dict) – dict of column names with a label per column
  • var_units (dict) – dict of column names with a unit per column
  • var_bins (dict) – dict of column names with the number of bins per column. Default per column is 30.
  • hist_y_label (str) – y-axis label to plot for all columns. Default is ‘Bin Counts’.
  • pages_key (str) – data store key of existing report pages
assert_data_type(data)¶

Check type of input data.

Parameters:data – input data sample (pandas dataframe or dict)
execute()¶

Execute the link.

Creates a report page for each variable in data frame.

  • create statistics object for column
  • create overview table of column variable
  • plot histogram of column variable
  • store plot
Returns:execution status code
Return type:StatusCode
finalize()¶

Finalize the link.

get_all_columns(data)¶

Retrieve all columns / keys from input data.

Parameters:data – input data sample (pandas dataframe or dict)
Returns:list of columns
Return type:list
get_length(data)¶

Get length of data set.

Parameters:data – input data (pandas dataframe or dict)
Returns:length of data set
get_sample(data, key)¶

Retrieve speficic column or item from input data.

Parameters:
  • data – input data (pandas dataframe or dict)
  • key (str) – column key
Returns:

data series or item

initialize()¶

Initialize the link.

process_1d_histogram(name, hist)¶

Create statistics of and plot input 1d histogram.

Parameters:
  • name (str) – name of the histogram
  • hist – input histogram object
process_2d_histogram(name, hist)¶

Create statistics of and plot input 2d histogram.

Parameters:
  • name (str) – name of the histogram
  • hist – input histogram object
process_nan_histogram(nphist, n_data)¶

Process nans histogram.

Add nans histogram to pdf list

Parameters:
  • nphist – numpy-style input histogram, consisting of comma-separaged bin_entries, bin_edges
  • n_data (int) – number of entries in the processed data set
process_sample(name, sample)¶

Process various possible data samples.

Parameters:
  • name (str) – name of sample
  • sample – input pandas series object or histogram
process_series(col, sample)¶

Create statistics of and plot input pandas series.

Parameters:
  • col (str) – name of the series
  • sample – input pandas series object
Next Previous

© Copyright 2018, KPMG Advisory N.V. Revision fcadefad.

Built with Sphinx using a theme provided by Read the Docs.