eskapade.visualization package¶
Subpackages¶
Submodules¶
eskapade.visualization.vis_utils module¶
Project: Eskapade - A python-based package for data analysis.
Created: 2017/02/28
- Description:
- Utility functions to collect Eskapade python modules e.g. functions to get correct Eskapade file paths and env variables
- Authors:
- KPMG Advanced Analytics & Big Data team, Amstelveen, The Netherlands
Redistribution and use in source and binary forms, with or without modification, are permitted according to the terms listed in the file LICENSE.
-
eskapade.visualization.vis_utils.
box_plot
(df, cause_col, result_col='cost', pdf_file_name='', ylim_quant=0.95, ylim_high=None, ylim_low=0, rot=90, statlim=400, label_dict=None, title_add='', top=20)¶ Make box plot.
Function that plots the boxplot of the column df[result_col] in groups of cause_col. This means that the DataFrame is grouped-by on the cause column and then the distribution per group is plotted in a boxplot using the standard pandas functionality. Boxplots with less than statlim (default=400 ) entries in it are automatically removed.
Parameters: - df – pandas DataFrame
- cause_col (str) – name of the column to group on. This can technically be a number, but that is uncommon.
- result_col (str) – column to do the boxplot on
- pdf_file_name (str) – if set, will store the plot in a pdf file
- ylim_quant (float) – the quantile of the y upper limit
- ylim_high (float) – when defined, this limit is used, when not defined, defaults to None and ylim_high is determined by ylim_quant
- ylim_low (float) – matplotlib set_ylim lower bound
- rot (int) – matplotlib rot
- statlim (int) – the number of entries that a group is required to have in order to be plotted
- label_dict (dict) – dictionary with labels for the columns, usage example: label_dict={‘col_x’: ‘Time’}
- title_add (str) – string that is added to the automatic title (the y column name)
- top (int) – only print the top 20 characters of x-labels and y-labels. (default is 20)
-
eskapade.visualization.vis_utils.
delete_smallstat
(df, group_col, statlim=400)¶ Remove low-statistics groups from dataframe.
Function to make a new DataFrame that removes all groups of group_col that have less than statlim entries.
Parameters: - df – pandas DataFrame
- group_col (str) – name of the column to group on
- statlim (int) – number of entries a group has to have to be statistically significant
Returns: smaller DataFrame and the number of removed categories
Return type: tuple
-
eskapade.visualization.vis_utils.
plot_2d_histogram
(hist, x_lim, y_lim, title, x_label, y_label, pdf_file_name)¶ Plot 2d histogram with matplotlib.
Parameters: - hist – input numpy histogram = x_bin_edges, y_bin_edges, bin_entries_2dgrid
- x_lim (tuple) – range tuple of x-axis (min,max)
- y_lim (tuple) – range tuple of y-axis (min,max)
- title (str) – title of plot
- x_label (str) – Label for histogram x-axis
- y_label (str) – Label for histogram y-axis
- pdf_file_name (str) – if set, will store the plot in a pdf file
-
eskapade.visualization.vis_utils.
plot_correlation_matrix
(matrix_colors, x_labels, y_labels, pdf_file_name='', title='correlation', vmin=-1, vmax=1, color_map='RdYlGn', x_label='', y_label='', top=20, matrix_numbers=None, print_both_numbers=True)¶ Create and plot correlation matrix.
Parameters: - matrix_colors – input correlation matrix
- x_labels (list) – Labels for histogram x-axis bins
- y_labels (list) – Labels for histogram y-axis bins
- pdf_file_name (str) – if set, will store the plot in a pdf file
- title (str) – if set, title of the plot
- vmin (float) – minimum value of color legend (default is -1)
- vmax (float) – maximum value of color legend (default is +1)
- x_label (str) – Label for histogram x-axis
- y_label (str) – Label for histogram y-axis
- color_map (str) – color map passed to matplotlib pcolormesh. (default is ‘RdYlGn’)
- top (int) – only print the top 20 characters of x-labels and y-labels. (default is 20)
- matrix_numbers – input matrix used for plotting numbers. (default it matrix_colors)
-
eskapade.visualization.vis_utils.
plot_histogram
(hist, x_label, y_label=None, is_num=True, is_ts=False, pdf_file_name='', top=20)¶ Create and plot histogram of column values.
Parameters: - hist – input numpy histogram = values, bin_edges
- x_label (str) – Label for histogram x-axis
- y_label (str) – Label for histogram y-axis
- is_num (bool) – True if observable to plot is numeric
- is_ts (bool) – True if observable to plot is a timestamp
- pdf_file_name (str) – if set, will store the plot in a pdf file
- top (int) – only print the top 20 characters of x-labels and y-labels. (default is 20)