Use case
Serve as baseline, a simple ML algo based on simple rules of thumb
What is it?
- For classification: use the mode of y_train to predict y_test
- For regression: use the mean / median / constant value of y_train to predict y_test
How?
Classification
read df
import pandas as pd
# Prepare data
df = pd.read_csv("data/quiz2-grade-toy-classification.csv")
df
0 |
1 |
1 |
92 |
93 |
84 |
91 |
92 |
1 |
1 |
0 |
94 |
90 |
80 |
83 |
91 |
2 |
0 |
0 |
78 |
85 |
83 |
80 |
80 |
3 |
0 |
1 |
91 |
94 |
92 |
91 |
89 |
4 |
0 |
1 |
77 |
83 |
90 |
92 |
85 |
from sklearn.dummy import DummyClassifier
y, X = df.pop("quiz2"), df
clf = DummyClassifier(strategy="most_frequent")
clf.fit(X, y)
clf.predict(X)
array(['not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+',
'not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+',
'not A+', 'not A+', 'not A+', 'not A+', 'not A+', 'not A+',
'not A+', 'not A+', 'not A+'], dtype='<U6')
clf.score(X, y) # accuracy
Regression
read df
import pandas as pd
df = pd.read_csv("data/quiz2-grade-toy-regression.csv")
df
0 |
1 |
1 |
92 |
93 |
84 |
91 |
92 |
1 |
1 |
0 |
94 |
90 |
80 |
83 |
91 |
2 |
0 |
0 |
78 |
85 |
83 |
80 |
80 |
3 |
0 |
1 |
91 |
94 |
92 |
91 |
89 |
4 |
0 |
1 |
77 |
83 |
90 |
92 |
85 |
from sklearn.dummy import DummyRegressor
y, X = df.pop("quiz2"), df
reg = DummyRegressor(strategy="mean")
reg.fit(X, y)
reg.predict(X)
array([86.28571429, 86.28571429, 86.28571429, 86.28571429, 86.28571429,
86.28571429, 86.28571429])
Hyperparameters
strategy
- (
DummyClassifier
) {“most_frequent”, “prior”, “stratified”, “uniform”, “constant”}. Default: “prior”
- (
DummyRegressor
) {“mean”, “median”, “quantile”, “constant”}. Default: “mean”
constant
- specified if
strategy = "constant"
- for
DummyClassifier
, the constant must exist in the y