the decision blog

mushroom edibility prediction with a vector support classifier

Written on Dec 18, 2022

Suppose we'd like to make a vector support classifier to help us decide whether a mushroom might be edible. Disclaimer: I personally don't eat mushrooms that I find in the forest and, if I did, I definitely wouldn't use a machine learning algorithm alone to decide which ones to eat!! Now that's out of the way, let's see how well we can do.

After extracting the data, we have a look at it:

import pandas as pd
mushrooms = pd.read_csv('mushrooms.csv')
mushrooms.head()

	class	cap-shape	cap-surface	cap-color	bruises	odor	gill-attachment	gill-spacing	gill-size	gill-color	...	stalk-surface-below-ring	stalk-color-above-ring	stalk-color-below-ring	veil-type	veil-color	ring-number	ring-type	spore-print-color	population	habitat
0	p	x	s	n	t	p	f	c	n	k	...	s	w	w	p	w	o	p	k	s	u
1	e	x	s	y	t	a	f	c	b	k	...	s	w	w	p	w	o	p	n	n	g
2	e	b	s	w	t	l	f	c	b	n	...	s	w	w	p	w	o	p	n	n	m
3	p	x	y	w	t	p	f	c	n	n	...	s	w	w	p	w	o	p	k	s	u
4	e	x	s	g	f	n	f	w	b	k	...	s	w	w	p	w	o	e	n	a	g

5 rows × 23 columns

The "class" field represents "poisonous" with a "p" and "edible" with "e"; the other fields encode other mushroom properties similarly. We'll need to convert these letters into numerical data; for that we'll use the python command "ord":

for n in range(0,mushrooms.shape[1]): 
    mushrooms.iloc[:,n] = [ord(x) - 97
                           for x in mushrooms[mushrooms.columns[n]]]

mushrooms.head()

	class	cap-shape	cap-surface	cap-color	bruises	odor	gill-attachment	gill-spacing	gill-size	gill-color	...	stalk-surface-below-ring	stalk-color-above-ring	stalk-color-below-ring	veil-type	veil-color	ring-number	ring-type	spore-print-color	population	habitat
0	15	23	18	13	19	15	5	2	13	10	...	18	22	22	15	22	14	15	10	18	20
1	4	23	18	24	19	0	5	2	1	10	...	18	22	22	15	22	14	15	13	13	6
2	4	1	18	22	19	11	5	2	1	13	...	18	22	22	15	22	14	15	13	13	12
3	15	23	24	22	19	15	5	2	13	13	...	18	22	22	15	22	14	15	10	18	20
4	4	23	18	6	5	13	5	22	1	10	...	18	22	22	15	22	14	4	13	0	6

5 rows × 23 columns

Ideally we wouldn't want to have to input all of these fields to classify a mushroom, so let's see what happens if we try to classify based only on, say, cap-shape and cap-surface.

from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import pandas as pd

model = SVC(kernel='rbf',C=1000)

y = mushrooms['class']
X = mushrooms[['cap-shape','cap-surface']]
X_train, X_test, y_train, y_test = \
        train_test_split(X,y,train_size=0.5)
model.fit(X_train,y_train)

y_pred = model.predict(X_test)

print('accuracy score: ',accuracy_score(y_test,y_pred))

%matplotlib inline
mat = confusion_matrix(y_test,y_pred)
sns.heatmap(mat.T, square=True,annot=True, \
            xticklabels=['edible','poisonous'], \
            yticklabels=['edible','poisonous'], cbar=False)
plt.xlabel('true label')
plt.ylabel('predicted label')

accuracy score:  0.6302314130969966
Text(109.44999999999997, 0.5, 'predicted label')

png

We are only able to achieve about 63% accuracy this way, with 730 poisonous mushrooms classified as edible. We can do better by including more mushroom attributes, for example cap-color and gill-color:

X = mushrooms[['cap-shape','cap-surface','cap-color','gill-color']]
X_train, X_test, y_train, y_test = \
        train_test_split(X,y,train_size=0.5)
model.fit(X_train,y_train)
y_pred = model.predict(X_test)
print('accuracy score: ',accuracy_score(y_test,y_pred))

mat = confusion_matrix(y_test,y_pred)
sns.heatmap(mat.T, square=True,annot=True, \
            xticklabels=['edible','poisonous'], \
            yticklabels=['edible','poisonous'],cbar=False)
plt.xlabel('true label')
plt.ylabel('predicted label')

accuracy score:  0.8648449039881831
Text(109.44999999999997, 0.5, 'predicted label')

png

We've gone up to 86% accuracy and now misclassify 400 poisonous mushrooms as edible. If we keep adding attributes then the accuracy increases, but of course that also means we have more work to do in describing the mushroom we've found.

	class	cap-shape	cap-surface	cap-color	bruises	odor	gill-attachment	gill-spacing	gill-size	gill-color	...	stalk-surface-below-ring	stalk-color-above-ring	stalk-color-below-ring	veil-type	veil-color	ring-number	ring-type	spore-print-color	population	habitat
0	p	x	s	n	t	p	f	c	n	k	...	s	w	w	p	w	o	p	k	s	u
1	e	x	s	y	t	a	f	c	b	k	...	s	w	w	p	w	o	p	n	n	g
2	e	b	s	w	t	l	f	c	b	n	...	s	w	w	p	w	o	p	n	n	m
3	p	x	y	w	t	p	f	c	n	n	...	s	w	w	p	w	o	p	k	s	u
4	e	x	s	g	f	n	f	w	b	k	...	s	w	w	p	w	o	e	n	a	g

	class	cap-shape	cap-surface	cap-color	bruises	odor	gill-attachment	gill-spacing	gill-size	gill-color	...	stalk-surface-below-ring	stalk-color-above-ring	stalk-color-below-ring	veil-type	veil-color	ring-number	ring-type	spore-print-color	population	habitat
0	15	23	18	13	19	15	5	2	13	10	...	18	22	22	15	22	14	15	10	18	20
1	4	23	18	24	19	0	5	2	1	10	...	18	22	22	15	22	14	15	13	13	6
2	4	1	18	22	19	11	5	2	1	13	...	18	22	22	15	22	14	15	13	13	12
3	15	23	24	22	19	15	5	2	13	13	...	18	22	22	15	22	14	15	10	18	20
4	4	23	18	6	5	13	5	22	1	10	...	18	22	22	15	22	14	4	13	0	6

	class	cap-shape	cap-surface	cap-color	bruises	odor	gill-attachment	gill-spacing	gill-size	gill-color	...	stalk-surface-below-ring	stalk-color-above-ring	stalk-color-below-ring	veil-type	veil-color	ring-number	ring-type	spore-print-color	population	habitat
0	p	x	s	n	t	p	f	c	n	k	...	s	w	w	p	w	o	p	k	s	u
1	e	x	s	y	t	a	f	c	b	k	...	s	w	w	p	w	o	p	n	n	g
2	e	b	s	w	t	l	f	c	b	n	...	s	w	w	p	w	o	p	n	n	m
3	p	x	y	w	t	p	f	c	n	n	...	s	w	w	p	w	o	p	k	s	u
4	e	x	s	g	f	n	f	w	b	k	...	s	w	w	p	w	o	e	n	a	g

	class	cap-shape	cap-surface	cap-color	bruises	odor	gill-attachment	gill-spacing	gill-size	gill-color	...	stalk-surface-below-ring	stalk-color-above-ring	stalk-color-below-ring	veil-type	veil-color	ring-number	ring-type	spore-print-color	population	habitat
0	15	23	18	13	19	15	5	2	13	10	...	18	22	22	15	22	14	15	10	18	20
1	4	23	18	24	19	0	5	2	1	10	...	18	22	22	15	22	14	15	13	13	6
2	4	1	18	22	19	11	5	2	1	13	...	18	22	22	15	22	14	15	13	13	12
3	15	23	24	22	19	15	5	2	13	13	...	18	22	22	15	22	14	15	10	18	20
4	4	23	18	6	5	13	5	22	1	10	...	18	22	22	15	22	14	4	13	0	6

	class	cap-shape	cap-surface	cap-color	bruises	odor	gill-attachment	gill-spacing	gill-size	gill-color	...	stalk-surface-below-ring	stalk-color-above-ring	stalk-color-below-ring	veil-type	veil-color	ring-number	ring-type	spore-print-color	population	habitat
0	p	x	s	n	t	p	f	c	n	k	...	s	w	w	p	w	o	p	k	s	u
1	e	x	s	y	t	a	f	c	b	k	...	s	w	w	p	w	o	p	n	n	g
2	e	b	s	w	t	l	f	c	b	n	...	s	w	w	p	w	o	p	n	n	m
3	p	x	y	w	t	p	f	c	n	n	...	s	w	w	p	w	o	p	k	s	u
4	e	x	s	g	f	n	f	w	b	k	...	s	w	w	p	w	o	e	n	a	g

	class	cap-shape	cap-surface	cap-color	bruises	odor	gill-attachment	gill-spacing	gill-size	gill-color	...	stalk-surface-below-ring	stalk-color-above-ring	stalk-color-below-ring	veil-type	veil-color	ring-number	ring-type	spore-print-color	population	habitat
0	15	23	18	13	19	15	5	2	13	10	...	18	22	22	15	22	14	15	10	18	20
1	4	23	18	24	19	0	5	2	1	10	...	18	22	22	15	22	14	15	13	13	6
2	4	1	18	22	19	11	5	2	1	13	...	18	22	22	15	22	14	15	13	13	12
3	15	23	24	22	19	15	5	2	13	13	...	18	22	22	15	22	14	15	10	18	20
4	4	23	18	6	5	13	5	22	1	10	...	18	22	22	15	22	14	4	13	0	6