In empirical risk minimization, we minimize the average loss on a training set. If our prediction functions are producing probability distributions, what loss functions will give a reasonable performance measure? In this lecture, we discuss "likelihood", one of the most popular performance measures for distributions. We temporarily leave aside the conditional probability modeling problem, and focus on the simpler problem of fitting an unconditional probability model to data. We can use "maximum likelihood" to fit both parametric and nonparametric models. Once we have developed a collection of candidate probability distributions on training data, we select the best one by choosing the model that has highest "hold-out likelihood", i.e. likelihood on validation data.
Access the full course at https://bloom.bg/2ui2T4q