轻雀文档

Machine Learning part6--Evaluating a Hypothesis

Evaluating a Hypothesis

Once we have done some trouble shooting for errors in our predictions by:

一旦我们通过以下方式对预测中的错误做了一些故障排除：

●Getting more training examples

●获得更多的训练实例

●Trying smaller sets of features

●尝试较小的特征集

●Trying additional features

●尝试更多的特征

●Trying polynomial features

●尝试多项式特征

●Increasing or decreasing λ

●增加或减少 λ

We can move on to evaluate our new hypothesis.

我们可以继续评估我们的新假说。

A hypothesis may have a low error for the training examples but still be inaccurate (because of overfitting). Thus, to evaluate a hypothesis, given a dataset of training examples, we can split up the data into two sets: a training set and a test set. Typically, the training set consists of 70 % of your data and the test set is the remaining 30 %.

一个假设可能对训练实例的误差很低，但仍然是不准确的（因为过度拟合）。因此，为了评估一个假设，给定一个训练例子的数据集，我们可以把数据分成两组：训练集和测试集。通常情况下，训练集由70%的数据组成，测试集是剩下的30%的数据。

The new procedure using these two sets is then:

那么，使用这两个集的新程序就是：

This gives us the proportion of the test data that was misclassified.

这给我们提供了测试数据中被错误分类的比例。

Model Selection and Train/Validation/Test Sets

Just because a learning algorithm fits a training set well, that does not mean it is a good hypothesis. It could over fit and as a result your predictions on the test set would be poor. The error of your hypothesis as measured on the data set with which you trained the parameters will be lower than the error on any other data set.