%load_ext pretty_jupyter

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_theme()

blue_color = sns.color_palette()[0]

Motivation

The goal of this file is to demonstrate the capabilities of Pretty Jupyter package.

Input Data

In this section, we inspect the input data.

data = pd.DataFrame({
    "money": [30000, 40000, 70000, 65000, 25000],
    "weight": [80, 50, 80, 70, 54],
    "gender": ["Male", "Female", "Male", "Male", "Female"]
})
data.head()
money weight gender
0 30000 80 Male
1 40000 50 Female
2 70000 80 Male
3 65000 70 Male
4 25000 54 Female

The input dataset has:

The columns and their dtypes are the following:

data.dtypes.reset_index().rename(columns={"index": "col_name", 0: "dtype"})
col_name dtype
0 money int64
1 weight int64
2 gender object

Money

fig, ax = plt.subplots()
bins = data["money"].sort_values().pipe(pd.cut,
    [0, 20000, 30000, 50000, 100000, 1000000000000],
    labels=["0-20000", "20000-30000", "30000-50000", "50000-100000", ">100000"]).value_counts()
sns.barplot(x=bins.index, y=bins.values, color=blue_color)
ax.set(title="Histogram of money", xlabel="Money category", ylabel="Count")
ax.set_xticklabels(ax.get_xticklabels(), rotation=45)
plt.show()

Gender

ax = sns.barplot(data=data["gender"].value_counts().reset_index(), x="index", y="gender", color=blue_color)
ax.set(title="Gender countplot", xlabel="Gender", ylabel="Count")
ax.figure.show()

Weight

Weight is a continuous variable.

ax = sns.kdeplot(data["weight"])
ax.set(title="Weight KDE", xlabel="Weight")
plt.show()

Correlations

Correlation between the money and weight seems to be rather weak. The gender is not analyzed because it's a categorical variable.

sns.heatmap(data.corr().abs(), cmap="Blues", vmin=0, vmax=1)
plt.show()

Conclusion

Pretty Jupyter is awesome and I'm definitely installing it ;).