%load_ext pretty_jupyter
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_theme()
blue_color = sns.color_palette()[0]
The goal of this file is to demonstrate the capabilities of Pretty Jupyter package.
In this section, we inspect the input data.
data = pd.DataFrame({
"money": [30000, 40000, 70000, 65000, 25000],
"weight": [80, 50, 80, 70, 54],
"gender": ["Male", "Female", "Male", "Male", "Female"]
})
data.head()
money | weight | gender | |
---|---|---|---|
0 | 30000 | 80 | Male |
1 | 40000 | 50 | Female |
2 | 70000 | 80 | Male |
3 | 65000 | 70 | Male |
4 | 25000 | 54 | Female |
The input dataset has:
The columns and their dtypes are the following:
data.dtypes.reset_index().rename(columns={"index": "col_name", 0: "dtype"})
col_name | dtype | |
---|---|---|
0 | money | int64 |
1 | weight | int64 |
2 | gender | object |
fig, ax = plt.subplots()
bins = data["money"].sort_values().pipe(pd.cut,
[0, 20000, 30000, 50000, 100000, 1000000000000],
labels=["0-20000", "20000-30000", "30000-50000", "50000-100000", ">100000"]).value_counts()
sns.barplot(x=bins.index, y=bins.values, color=blue_color)
ax.set(title="Histogram of money", xlabel="Money category", ylabel="Count")
ax.set_xticklabels(ax.get_xticklabels(), rotation=45)
plt.show()
ax = sns.barplot(data=data["gender"].value_counts().reset_index(), x="index", y="gender", color=blue_color)
ax.set(title="Gender countplot", xlabel="Gender", ylabel="Count")
ax.figure.show()
Weight is a continuous variable.
ax = sns.kdeplot(data["weight"])
ax.set(title="Weight KDE", xlabel="Weight")
plt.show()
Correlation between the money and weight seems to be rather weak. The gender is not analyzed because it's a categorical variable.
sns.heatmap(data.corr().abs(), cmap="Blues", vmin=0, vmax=1)
plt.show()
Pretty Jupyter is awesome and I'm definitely installing it ;).