eskapade.visualization.links package¶
Submodules¶
eskapade.visualization.links.correlation_summary module¶
Project: Eskapade - A python-based package for data analysis.
Class : correlation_summary
Created: 2017/03/13
- Description:
- Algorithm to do create correlation heatmaps.
- Authors:
- KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands
Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.
-
class
eskapade.visualization.links.correlation_summary.
CorrelationSummary
(**kwargs)¶ Bases:
eskapade.core.element.Link
Create a heatmap of correlations between dataframe variables.
-
__init__
(**kwargs)¶ Initialize link instance.
Parameters: - name (str) – name of link
- read_key (str) – key of input dataframe to read from data store
- store_key (str) – key of correlations dataframe in data store
- results_path (str) – path to save correlation summary pdf
- methods (list) – method(s) of computing correlations
- pages_key (str) – data store key of existing report pages
-
execute
()¶ Execute the link.
-
finalize
()¶ Finalize the link.
-
initialize
()¶ Initialize the link.
-
eskapade.visualization.links.df_boxplot module¶
Project: Eskapade - A python-based package for data analysis.
Class : DfBoxplot
Created: 2017/02/17
- Description:
- Link to create a boxplot of data frame columns.
- Authors:
- KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands
Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.
-
class
eskapade.visualization.links.df_boxplot.
DfBoxplot
(**kwargs)¶ Bases:
eskapade.core.element.Link
Create a boxplot of one column of a DataFrame that is grouped by values from a second column.
Creates a report page for each variable in DataFrame, containing:
- a profile of the column dataset
- a nicely scaled plot of the boxplots per group of the column
Example is available in: tutorials/esk304_df_boxplot.py
-
__init__
(**kwargs)¶ Initialize link instance.
Parameters: - name (str) – name of link
- read_key (str) – key of input data to read from data store
- results_path (str) – output path of summary result files
- column (str) – column pick up from input data to use as boxplot input
- cause_columns (list) – list of columns (str) to group-by, and per unique value plot a boxplot
- statistics (list) – a list of strings of the statistics you want to generate for the boxplot the full list is taken from statistics.ArrayStats.get_latex_table defaults to: [‘count’, ‘mean’, ‘min’, ‘max’]
- pages_key (str) – data store key of existing report pages
-
execute
()¶ Execute the link.
Creates a report page for each column that we group-by in the data frame.
- create statistics object for group
- create overview table of column variable
- plot boxplot of column variable per group
- store plot
-
finalize
()¶ Finalize the link.
-
initialize
()¶ Initialize the link.
eskapade.visualization.links.df_summary module¶
Project: Eskapade - A python-based package for data analysis.
Class : DfSummary
Created: 2017/02/17
- Description:
- Link to create a statistics summary of data frame columns or of a set of histograms.
- Authors:
- KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands
Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.
-
class
eskapade.visualization.links.df_summary.
DfSummary
(**kwargs)¶ Bases:
eskapade.core.element.Link
Create a summary of a dataframe.
Creates a report page for each variable in data frame, containing:
- a profile of the column dataset
- a nicely scaled plot of the column dataset
Example 1 is available in: tutorials/esk301_dfsummary_plotter.py
Example 2 is available in: tutorials/esk303_histogram_filling_plotting.py Empty histograms are automatically skipped from processing.
-
__init__
(**kwargs)¶ Initialize link instance.
Parameters: - name (str) – name of link
- read_key (str) – key of input dataframe (or histogram-dict) to read from data store
- results_path (str) – output path of summary result files
- columns (list) – columns (or histogram keys) pick up from input data to make & plot summaries for
- hist_keys (list) – alternative to columns (optional)
- var_labels (dict) – dict of column names with a label per column
- var_units (dict) – dict of column names with a unit per column
- var_bins (dict) – dict of column names with the number of bins per column. Default per column is 30.
- hist_y_label (str) – y-axis label to plot for all columns. Default is ‘Bin Counts’.
- pages_key (str) – data store key of existing report pages
-
assert_data_type
(data)¶ Check type of input data.
Parameters: data – input data sample (pandas dataframe or dict)
-
execute
()¶ Execute the link.
Creates a report page for each variable in data frame.
- create statistics object for column
- create overview table of column variable
- plot histogram of column variable
- store plot
Returns: execution status code Return type: StatusCode
-
finalize
()¶ Finalize the link.
-
get_all_columns
(data)¶ Retrieve all columns / keys from input data.
Parameters: data – input data sample (pandas dataframe or dict) Returns: list of columns Return type: list
-
get_length
(data)¶ Get length of data set.
Parameters: data – input data (pandas dataframe or dict) Returns: length of data set
-
get_sample
(data, key)¶ Retrieve speficic column or item from input data.
Parameters: - data – input data (pandas dataframe or dict)
- key (str) – column key
Returns: data series or item
-
initialize
()¶ Initialize the link.
-
process_1d_histogram
(name, hist)¶ Create statistics of and plot input 1d histogram.
Parameters: - name (str) – name of the histogram
- hist – input histogram object
-
process_2d_histogram
(name, hist)¶ Create statistics of and plot input 2d histogram.
Parameters: - name (str) – name of the histogram
- hist – input histogram object
-
process_nan_histogram
(nphist, n_data)¶ Process nans histogram.
Add nans histogram to pdf list
Parameters: - nphist – numpy-style input histogram, consisting of comma-separaged bin_entries, bin_edges
- n_data (int) – number of entries in the processed data set
-
process_sample
(name, sample)¶ Process various possible data samples.
Parameters: - name (str) – name of sample
- sample – input pandas series object or histogram
-
process_series
(col, sample)¶ Create statistics of and plot input pandas series.
Parameters: - col (str) – name of the series
- sample – input pandas series object
Module contents¶
-
class
eskapade.visualization.links.
CorrelationSummary
(**kwargs)¶ Bases:
eskapade.core.element.Link
Create a heatmap of correlations between dataframe variables.
-
__init__
(**kwargs)¶ Initialize link instance.
Parameters: - name (str) – name of link
- read_key (str) – key of input dataframe to read from data store
- store_key (str) – key of correlations dataframe in data store
- results_path (str) – path to save correlation summary pdf
- methods (list) – method(s) of computing correlations
- pages_key (str) – data store key of existing report pages
-
execute
()¶ Execute the link.
-
finalize
()¶ Finalize the link.
-
initialize
()¶ Initialize the link.
-
-
class
eskapade.visualization.links.
DfBoxplot
(**kwargs)¶ Bases:
eskapade.core.element.Link
Create a boxplot of one column of a DataFrame that is grouped by values from a second column.
Creates a report page for each variable in DataFrame, containing:
- a profile of the column dataset
- a nicely scaled plot of the boxplots per group of the column
Example is available in: tutorials/esk304_df_boxplot.py
-
__init__
(**kwargs)¶ Initialize link instance.
Parameters: - name (str) – name of link
- read_key (str) – key of input data to read from data store
- results_path (str) – output path of summary result files
- column (str) – column pick up from input data to use as boxplot input
- cause_columns (list) – list of columns (str) to group-by, and per unique value plot a boxplot
- statistics (list) – a list of strings of the statistics you want to generate for the boxplot the full list is taken from statistics.ArrayStats.get_latex_table defaults to: [‘count’, ‘mean’, ‘min’, ‘max’]
- pages_key (str) – data store key of existing report pages
-
execute
()¶ Execute the link.
Creates a report page for each column that we group-by in the data frame.
- create statistics object for group
- create overview table of column variable
- plot boxplot of column variable per group
- store plot
-
finalize
()¶ Finalize the link.
-
initialize
()¶ Initialize the link.
-
class
eskapade.visualization.links.
DfSummary
(**kwargs)¶ Bases:
eskapade.core.element.Link
Create a summary of a dataframe.
Creates a report page for each variable in data frame, containing:
- a profile of the column dataset
- a nicely scaled plot of the column dataset
Example 1 is available in: tutorials/esk301_dfsummary_plotter.py
Example 2 is available in: tutorials/esk303_histogram_filling_plotting.py Empty histograms are automatically skipped from processing.
-
__init__
(**kwargs)¶ Initialize link instance.
Parameters: - name (str) – name of link
- read_key (str) – key of input dataframe (or histogram-dict) to read from data store
- results_path (str) – output path of summary result files
- columns (list) – columns (or histogram keys) pick up from input data to make & plot summaries for
- hist_keys (list) – alternative to columns (optional)
- var_labels (dict) – dict of column names with a label per column
- var_units (dict) – dict of column names with a unit per column
- var_bins (dict) – dict of column names with the number of bins per column. Default per column is 30.
- hist_y_label (str) – y-axis label to plot for all columns. Default is ‘Bin Counts’.
- pages_key (str) – data store key of existing report pages
-
assert_data_type
(data)¶ Check type of input data.
Parameters: data – input data sample (pandas dataframe or dict)
-
execute
()¶ Execute the link.
Creates a report page for each variable in data frame.
- create statistics object for column
- create overview table of column variable
- plot histogram of column variable
- store plot
Returns: execution status code Return type: StatusCode
-
finalize
()¶ Finalize the link.
-
get_all_columns
(data)¶ Retrieve all columns / keys from input data.
Parameters: data – input data sample (pandas dataframe or dict) Returns: list of columns Return type: list
-
get_length
(data)¶ Get length of data set.
Parameters: data – input data (pandas dataframe or dict) Returns: length of data set
-
get_sample
(data, key)¶ Retrieve speficic column or item from input data.
Parameters: - data – input data (pandas dataframe or dict)
- key (str) – column key
Returns: data series or item
-
initialize
()¶ Initialize the link.
-
process_1d_histogram
(name, hist)¶ Create statistics of and plot input 1d histogram.
Parameters: - name (str) – name of the histogram
- hist – input histogram object
-
process_2d_histogram
(name, hist)¶ Create statistics of and plot input 2d histogram.
Parameters: - name (str) – name of the histogram
- hist – input histogram object
-
process_nan_histogram
(nphist, n_data)¶ Process nans histogram.
Add nans histogram to pdf list
Parameters: - nphist – numpy-style input histogram, consisting of comma-separaged bin_entries, bin_edges
- n_data (int) – number of entries in the processed data set
-
process_sample
(name, sample)¶ Process various possible data samples.
Parameters: - name (str) – name of sample
- sample – input pandas series object or histogram
-
process_series
(col, sample)¶ Create statistics of and plot input pandas series.
Parameters: - col (str) – name of the series
- sample – input pandas series object