matplotlib系列(5)-- 高级封装

seaborn是对matplotlib更高级的API封装,让你能用更少的代码去调用 matplotlib的方法,从而使得作图更加容易。

一、分布图

1、核密度估计图

单变量核密度估计图

1
2
3
4
5
6
7
8
9
10
11
12
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

mean, cov = [0, 2], [(1, .5), (.5, 1)]
x, y = np.random.multivariate_normal(mean, cov, size=50).T

sns.kdeplot(x,
shade=True,
shade_lowest=False)

plt.show()

双变量核密度估计图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import imp
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris()
d = pd.DataFrame(iris.data, columns=["sepal_length","sepal_width","petal_length","petal_width"])
d["species"] = iris.target

d.loc[d["species"]==0, "species"] = "setosa" # 把类别这一列数值为0的替换为setosa
d.loc[d["species"]==1, "species"] = "versicolor" # 把类别这一列数值为1的替换为versicolor
d.loc[d["species"]==2, "species"] = "virginica" # 把类别这一列数值为2的替换为virginica

sns.kdeplot(d.sepal_length[d.species=="setosa"],
d.sepal_width[d.species=="setosa"],
cmap="Reds",
shade=True,
shade_lowest=False)
sns.kdeplot(d.sepal_length[d.species=="versicolor"],
d.sepal_width[d.species=="versicolor"],
cmap="Blues",
shade=True,
shade_lowest=False)
plt.show()

2、联合分布图

联合概率分布简称联合分布,是两个及以上随机变量组成的随机向量的概率分布。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import imp
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris()
d = pd.DataFrame(iris.data, columns=["sepal_length","sepal_width","petal_length","petal_width"])
d["species"] = iris.target

d.loc[d["species"]==0, "species"] = "setosa" # 把类别这一列数值为0的替换为setosa
d.loc[d["species"]==1, "species"] = "versicolor" # 把类别这一列数值为1的替换为versicolor
d.loc[d["species"]==2, "species"] = "virginica" # 把类别这一列数值为2的替换为virginica

sns.jointplot(d.sepal_length,
d.sepal_width,
data=d,
kind='kde', # scatter|reg|resid|kde|hex
dropna=True)
plt.show()

3、变量关系组图

变量关系组图非常有用,人们经常用它来查看多个变量之间的联系。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import imp
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris()
d = pd.DataFrame(iris.data, columns=["sepal_length","sepal_width","petal_length","petal_width"])
d["species"] = iris.target

d.loc[d["species"]==0, "species"] = "setosa" # 把类别这一列数值为0的替换为setosa
d.loc[d["species"]==1, "species"] = "versicolor" # 把类别这一列数值为1的替换为versicolor
d.loc[d["species"]==2, "species"] = "virginica" # 把类别这一列数值为2的替换为virginica

sns.pairplot(d, hue="species")
plt.show()

seaborn分布数据可视化

二、回归图

简单线性回归的模型非常容易拟合。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import imp
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris()
d = pd.DataFrame(iris.data, columns=["sepal_length","sepal_width","petal_length","petal_width"])
d["species"] = iris.target

d.loc[d["species"]==0, "species"] = "setosa" # 把类别这一列数值为0的替换为setosa
d.loc[d["species"]==1, "species"] = "versicolor" # 把类别这一列数值为1的替换为versicolor
d.loc[d["species"]==2, "species"] = "virginica" # 把类别这一列数值为2的替换为virginica

sns.lmplot(x="sepal_length",
y="sepal_width",
data=d[d.species == "setosa"])
plt.show()

多项式回归模型可以拟合数据集中的一些简单的非线性趋势。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import imp
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris()
d = pd.DataFrame(iris.data, columns=["sepal_length","sepal_width","petal_length","petal_width"])
d["species"] = iris.target

d.loc[d["species"]==0, "species"] = "setosa" # 把类别这一列数值为0的替换为setosa
d.loc[d["species"]==1, "species"] = "versicolor" # 把类别这一列数值为1的替换为versicolor
d.loc[d["species"]==2, "species"] = "virginica" # 把类别这一列数值为2的替换为virginica

sns.lmplot(x="sepal_length",
y="sepal_width",
hue="species",
data=d,
order=2, # 二阶多项式
ci=None,
scatter_kws={"s": 80})
plt.show()

三、矩阵图

1、热图

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import imp
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris()
d = pd.DataFrame(iris.data, columns=["sepal_length","sepal_width","petal_length","petal_width"])
d["species"] = iris.target

d.loc[d["species"]==0, "species"] = "setosa" # 把类别这一列数值为0的替换为setosa
d.loc[d["species"]==1, "species"] = "versicolor" # 把类别这一列数值为1的替换为versicolor
d.loc[d["species"]==2, "species"] = "virginica" # 把类别这一列数值为2的替换为virginica

sns.heatmap(d.corr(),
xticklabels=d.corr().columns,
yticklabels=d.corr().columns,
cmap='RdYlGn',
center=0,
annot=True)

plt.show()

2、聚类图

1
2
3
4
5
6
7
8
9
10
11
12
import imp
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris()
d = pd.DataFrame(iris.data, columns=["sepal_length","sepal_width","petal_length","petal_width"])

sns.clustermap(d)

plt.show()
0%