Today I collect and organize useful data visualization (Data Viz) tools that aid data exploration.
I illustrate the use of the tools via the classic Abalone database, hosted on the University of California, Irvine (UCI) Machine Learning repository website.
I recommend you bookmark this and return to it when you need to find the syntax and semantics of popular data viz constructs.
Get the Data
PhD student David Aha created the University of California, Irvine (UCI) Machine Learning repository in 1987 in the form of a File Transfer Protocol (FTP) site. The Repo collects databases, domain theories, and data generators. Today I use the Abalone database.
The Abalone database provides a table of four thousand observations, which each contain one categorical feature, seven continuous features, and one target:
- Features, Categorical- Sex: Male, Female, and Infant
 
- Features, Continuous- Length: Longest shell measurement (mm)
- Diameter: Perpendicular to length (mm)
- Height: With meat in the shell (mm)
- Whole_weight: Whole abalone (grams)
- Shucked_weight: Weight of meat (grams)
- Viscera_weight: Gut weight after bleeding (grams)
- Shell_weight: After being dried (grams)
 
- Target, Integer- Rings: +1.5 gives the age in years
 
I use the Python requests library to pull the data straight from the UCI repo and stuff it into a Pandas DataFrame.
I import the required libraries.
import pandas as pd
import numpy as np
import io
import requests
import seaborn as sns
I set the url (String) and column_name (List) variables to match the Abalone database schema.
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data'
column_names = ['Sex',
                'Length',
                'Diameter',
                'Height',
                'Whole_weight',
                'Shucked_weight',
                'Viscera_weight',
                'Shell_weight',
                'Rings']
Requests downloads the HTTP object, StringIO decodes it and Pandas loads the decoded data into a DataFrame.
r = requests.get(url).content
abalone_df = pd.read_csv(io.StringIO(r.decode('utf-8')),
                      names = column_names)
One-Dimensional Statistical Summaries
We first explore the data in one dimension.
Histograms
Histograms provide a visual shorthand for the distribution of numerical data. Think of a connect four board, where you stack chips in different columns (or buckets). Each chip represents a number in that bucket.
Pandas provides a built-in hist() method.
abalone_df['Rings'].hist()
We use Pandas to draw a Histogram of our target variable, Rings.

Most Abalone include between 7.5 and 12.5 Rings.
Pandas also accommodates our Categorical feature.
abalone_df['Sex'].hist()

The corpus of data includes roughly equal observations for Male, Female and Infant.
Pandas allows us to run histograms on all features. The method ignores the Categorical feature.
abalone_df.plot.hist(subplots=True,layout=(4,2))

The results illustrate the need to Normalize the data, since all the Categorical features clock in under a value of one (1), and the target feature includes ranges up to thirty (30).
Hist with tags
InfluxDB uses the nomenclature Tags and Measurements to describe Categorical and Continuous variables.
Tags provide a new dimension of visual data, slicing and dicing the data into different categories.
Seaborn provides the option to color by Tag with their hue parameter.
sns.histplot(data=abalone_df, x='Rings',hue='Sex')

Hue does not make sense with Measurements:
sns.histplot(data=abalone_df, x='Rings',hue='Rings')

Kernel Density Estimation (KDE)
Kernel Density Estimation (KDE) smooths the Histograms. Instead of discrete buckets, we see continuous lines that represent the distribution.
I used the analogy above of a Histogram stacking chips on a connect four board. KDE pours sand at each point, enough to fill a Standard Normal Distribution. KDE in a sense stacks Standard Normal Distributions at each point, which leads to the smoothness of the plot.
If you reduce the bucket size to a very small number, you can see the idea in action.
abalone_df['Whole_weight'].hist()

abalone_df['Whole_weight'].hist(bins=25)

abalone_df['Whole_weight'].hist(bins=50)

abalone_df['Whole_weight'].plot.kde()

SNS will plot the KDE over the histogram if you instruct it to do so:
sns.histplot(data=abalone_df, x="Whole_weight", kde=True)

Pandas plots all features' distribution with KDE.
abalone_df.plot.kde(subplots=True,layout=(4,2))

Boxplots
A glance at a Boxplot tells you the median, 25th percentile, 75th percentile, and outliers.
The box shows the First and Third quartiles and the whiskers show data points that lie 1.5 times the Interquartile range (IQR) (for both top and bottom).
sns.boxplot(data=abalone_df, x='Whole_weight')

SNS allows you to separate the chart by Tag. If you set y equal to Sex, for example, you see the distributions split by Male, Female, and Infant.
sns.boxplot(data=abalone_df, x='Whole_weight',y='Sex')

In the Boxplot above, we see that Female Abalone weigh slightly more than Male Abalone.
Special Note: Enrich Data.
Remember that we have a target variable named Rings, which encompasses a range of numbers between one (1) and thirty (30). I recommend you enrich the Rings data with a new Tag.
The following code uses the Rings value to set a new Tag, which I named Age. The code splits the data into three ranges and applies to a given observation the tag Young, Middle_Age or Old based on the value of Rings.
abalone_df['Age'] = pd.qcut(abalone_df['Rings'],q=3,labels=['Young','Middle_Age','Old'])
abalone_df.head()
This new tag provides a new dimension to slice and dice our Boxplot.
sns.boxplot(data=abalone_df, x='Whole_weight',y='Sex',hue='Age')

We now see the relationship between Whole_weight, Sex and Age at a glance.
Violinplots
A Violinplot mirrors the Distribution, which gives the plot a Violin-like shape.
sns.violinplot(x=abalone_df['Rings'])

Violinplots also accommodate Tags.
sns.violinplot(data=abalone_df,x='Sex',y='Whole_weight',hue='Age')

Two-dimensional Plots
Python provides tools to explore Bivariate data sets.
Seaborn (SNS) provides two-dimensional Histograms and two-dimensional KDE tools.
Two-dimensional Histogram
Note that SNS only shows the top-down view for histograms.
sns.displot(abalone_df, x="Length", y="Height")

The SNS Bivariate Histograms accommodate tags.
sns.displot(abalone_df, x="Length", y="Height", hue="Age")

Two-dimensional KDE
SNS also provides two-dimensional KDE plots, with Tags.
sns.displot(abalone_df, x="Length", y="Height", hue="Age", kind="kde")

Look for Correlation
The Data Scientist looks for correlation between features and the target during the Data Exploration phase of the Machine Learning Pipeline
Data prep
In the Data Prep stage, we encode the Tags (String) into numeric values (float32).
The Pandas method get_dummies one-hot-encodes the Sex variable into Orthogonal dimensions. This increases the dimensionality of our data set.
We also use the factorize method to convert Young, Middle_Aged and Old into the integers 0,1 and 2.
abalone_reg_df = abalone_df.join(pd.get_dummies(abalone_df['Sex']))
abalone_reg_df['Age_Bucket'] = pd.factorize(abalone_df['Age'],sort=True)[0]
abalone_reg_df = abalone_reg_df.drop(['Sex','Age'],axis=1).astype(np.float32)
We pop off the labels for later use.
class_labels stores the target vector for Classification models, and reg_labels stores the target vector for Regression models.
class_labels = abalone_reg_df.pop('Age_Bucket')
reg_labels = abalone_reg_df.pop('Rings')
I also create vectors to pull like Features from the DataFrame (Measurements, Tags, Target).
metric_vars = ['Length',
               'Diameter',
               'Height',
               'Whole_weight',
               'Shucked_weight',
               'Viscera_weight',
               'Shell_weight']
encoded_vars = ['F',
                'I',
                'M']
y_vars = ['Rings']
Heatmap correlation
SNS provides a Heatmap matrix for correlation.
import matplotlib as plt
corr = abalone_reg_df.corr()
# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr,
                            dtype=bool))
# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))
# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 
                             20,
                             as_cmap=True)
# Draw the heatmap with the mask and 
# correct aspect ratio
sns.heatmap(corr, 
            mask=mask,
            cmap=cmap,
            vmax=1,
            center=0,
            square=True,
            linewidths=.5,
            cbar_kws={"shrink": .5})

We see that Diameter and Length have significant correlation and so do all of the weight features.
Pairgrid Correlation
This SNS Pairgrid plot shows the correlation between the features and the target, Rings.
g = sns.PairGrid(abalone_df,
                 x_vars = metric_vars,
                 y_vars = y_vars)
g.map_offdiag(sns.kdeplot)
g.add_legend()

All features depict a correlation slope close to around 25 degrees or so, which indicates Correlation.
Scatterplot with Regression
SNS plots the ML 101 favorite, Linear Regression right on the screen with the regplot action.
sns.regplot(x = abalone_df['Viscera_weight'],
                y = abalone_df['Rings'])

We see positive slope with pretty tight error bands, which indicates Viscera_weight predicts Rings.
Fancy Tilted 3d Plots
Remember that SNS only graphs top-down views. I wrote the following matplotlib function to show an isometric view of the data.
def plot_3d(df, target, feature1, 
            feature2, feature3):
    target_list = list(set(df[target]))
    fig = plt.figure(figsize = (12, 12))
    ax1 = fig.add_subplot(111, 
                          projection='3d')
    x3 = df.loc[df[target] == target_list[0]][feature1]
    y3 = df.loc[df[target] == target_list[0]][feature2]
    z3 = df.loc[df[target] == target_list[0]][feature3]
    ax1.scatter(x3, 
                y3,
                z3,
                label = target_list[0],
                color = "red")
    x3 = df.loc[df[target] == target_list[1]][feature1]
    y3 = df.loc[df[target] == target_list[1]][feature2]
    z3 = df.loc[df[target] == target_list[1]][feature3]
    ax1.scatter(x3,
                y3,
                z3,
                label = target_list[1],
                color = "green")
    x3 = df.loc[df[target] == target_list[2]][feature1]
    y3 = df.loc[df[target] == target_list[2]][feature2]
    z3 = df.loc[df[target] == target_list[2]][feature3]
    ax1.scatter(x3,
                y3,
                z3,
                label = target_list[2],
                color = "blue")
    ax1.legend()
I call the function with the Abalone data.
plot_3d(abalone_df,
        'Age',
        'Height',
        'Viscera_weight',
        'Length')

Dimensionality Reduction
Note my Graph above requires me to choose three (out of the possible eight) features at a time. This fact drives two questions:
- Which features do I use?
- How can I plot all the features at once?
Principal Component Analysis (PCA) collapses the information held in eight features into three, two or even one feature.
I write about PCA in my blog post on Regression with Keras and TensorFlow
If you stick a magnet at each point in the data space, and then stick a telescoping iron bar at the origin, the magnets will pull the bar into position and stretch the bar. The bar will wiggle a bit at first and then eventually settle into a static position. The final direction and length of the bar represent a principal component. We can map the higher dimensionality space to the principal component by connecting a string directly from each magnet to the bar. Where the string hits (taut) we make a mark. The marks represent the mapped vector space.
George Dallas also writes an excellent blog post that explains PCA.
Normalize
First Normalize the Data. TensorFlow provides a normalizer.
from tensorflow.keras.layers.experimental import preprocessing
normalizer = preprocessing.Normalization()
Fit the normalizer to our measurements (exclude the encoded tags).
normalizer.adapt(np.array(abalone_reg_df[metric_vars]))
One Principal Component
SciKitLearn provides PCA.
from sklearn.decomposition import PCA
The following code collapses all seven features into one Principal Component.
pca = PCA(n_components=1)
pca.fit(normalizer(abalone_reg_df[metric_vars]))
pca_abalone_df = pd.DataFrame(pca.transform(normalizer(abalone_reg_df[metric_vars])),
                                     columns = ['princomp1'],
                                     index=abalone_reg_df.index)
SNS shows the utility of this Principal Component on the separability of the Classes.
sns.histplot( x = pca_abalone_df['princomp1'],
              hue = class_labels)

Two Principal Components
Now derive two principal components.
pca = PCA(n_components=2)
pca.fit(normalizer(abalone_reg_df[metric_vars]))
pca_train_features_df = pd.DataFrame(pca.transform(normalizer(abalone_reg_df[metric_vars])),
                                     columns = ['princomp1',
                                                'princomp2'],
                                     index=abalone_reg_df.index)
A KDE plot shows the three classes in relation to the two Principal Components.
sns.kdeplot( data = pca_train_features_df,
             x = pca_train_features_df['princomp1'],
             y = pca_train_features_df['princomp2'],
             hue = class_labels,
             fill = False)

3 Principal Components
Astute readers anticipate the slight code modifications required to derive three Principal Components.
pca = PCA(n_components=3)
pca.fit(normalizer(abalone_reg_df[metric_vars]))
pca_train_features_df = pd.DataFrame(pca.transform(normalizer(abalone_reg_df[metric_vars])),
                                     columns = ['princomp1',
                                                'princomp2',
                                                'princomp3'],
                                     index=abalone_reg_df.index)
We use the 3d plot to see the separation of classes in relation to three Principal Components.
data_df = pca_train_features_df.assign(outcome=class_labels)
plot_3d(data_df,
        'outcome',
        'princomp1',
        'princomp2',
        'princomp3')

If you include one-hot encoded variables in your PCA, you may see weird results.
For example, we encoded the Categorical Sex feature into three Orthogonal numeric vectors, one for M, F and I. If you keep these vectors in the PCA you will see the following:

Conclusion
Bookmark this page for future reference. It provides a handy Cheat Sheet for useful Python Data Exploration and Data Viz tools.