class: center, middle, inverse, title-slide # MLCA Week 7: ## Hyperparameter Followup ### Mike Mahoney ### 2021-10-13 --- class: middle # What are we doing here? --- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ Some of the topics discussed fit within traditional statistics but others seem to have escaped, heading south, perhaps in the direction of computer science. The escapees were the large-scale prediction algorithms: neural nets, deep learning, boosting, random forests, and support-vector machines. Notably missing from their development were parametric probability models, the building blocks of classical inference. Prediction algorithms are the media stars of the big-data era. ] --- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ Statistics is a branch of applied mathematics, and is ultimately judged by how well it serves the world of applications. Mathematical logic, a la Fisher, has been the traditional vehicle for the development and understanding of statistical methods. Computation, slow and difficult before the 1950s, was only a bottleneck, but now has emerged as a competitor to (or perhaps an enabler of) mathematical analysis. ] --- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ A cohesive inferential theory was forged in the first half of the twentieth century, but unity came at the price of an inwardly focused discipline, of reduced practical utility. In the century’s second half, electronic computation unleashed a vast expansion of useful—and much used—statistical methodology. Expansion accelerated at the turn of the millennium, further increasing the reach of statistical thinking, but now at the price of intellectual cohesion. ] --- class: middle center ![](casi_epi.png) --- class: middle center <img src="hyperparameter_followup_files/figure-html/unnamed-chunk-1-1.png" width="100%" /> --- class: middle <div style="font-size: 200%"> <blockquote> This course attempts to guide students through several of the most common machine learning approaches at a conceptual level with a focus on applications in R. </blockquote> </div> <div style="font-size: 110%"> (From https://mlca.mm218.dev/#course-description) </div> --- # Part 1: Prediction <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Week </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:left;"> Prediction, Estimation, and Attribution </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:left;"> Regression </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:left;"> Classification </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:left;"> Classification with imbalanced classes </td> </tr> </tbody> </table> --- # Part 2: Machine Learning <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Week </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:left;"> Decision Trees </td> </tr> <tr> <td style="text-align:left;"> 6 </td> <td style="text-align:left;"> Random Forests </td> </tr> <tr> <td style="text-align:left;"> 7 </td> <td style="text-align:left;"> Hyperparameters and Model Tuning </td> </tr> <tr> <td style="text-align:left;"> 8 </td> <td style="text-align:left;"> Gradient Boosting Machines </td> </tr> <tr> <td style="text-align:left;"> 9 </td> <td style="text-align:left;"> Stochastic GBMs and Stacked Ensembles </td> </tr> <tr> <td style="text-align:left;"> 10 </td> <td style="text-align:left;"> k-Nearest Neighbors </td> </tr> <tr> <td style="text-align:left;"> 11 </td> <td style="text-align:left;"> Support Vector Machines (as time allows) </td> </tr> </tbody> </table> --- # Part 2: Machine Learning <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Week </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;color: grey !important;"> 5 </td> <td style="text-align:left;color: grey !important;"> Decision Trees </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> 6 </td> <td style="text-align:left;font-weight: bold;"> Random Forests </td> </tr> <tr> <td style="text-align:left;color: grey !important;"> 7 </td> <td style="text-align:left;color: grey !important;"> Hyperparameters and Model Tuning </td> </tr> <tr> <td style="text-align:left;color: grey !important;"> 8 </td> <td style="text-align:left;color: grey !important;"> Gradient Boosting Machines </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> 9 </td> <td style="text-align:left;font-weight: bold;"> Stochastic GBMs and Stacked Ensembles </td> </tr> <tr> <td style="text-align:left;color: grey !important;"> 10 </td> <td style="text-align:left;color: grey !important;"> k-Nearest Neighbors </td> </tr> <tr> <td style="text-align:left;font-weight: bold;"> 11 </td> <td style="text-align:left;font-weight: bold;"> Support Vector Machines (as time allows) </td> </tr> </tbody> </table> --- # Part 3: Doing The Thing <table class="table table-striped table-hover" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> Week </th> <th style="text-align:left;"> Topic </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 12-13 </td> <td style="text-align:left;"> Project Work </td> </tr> <tr> <td style="text-align:left;"> 14 </td> <td style="text-align:left;"> Presentations </td> </tr> </tbody> </table> --- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ A great amount of ingenuity and experimentation has gone into the development of modern prediction algorithms, with statisticians playing an important but not dominant role. There is no shortage of impressive success stories. In the absence of optimality criteria, either frequentist or Bayesian, the prediction community grades algorithmic excellence on performance within a catalog of often-visited examples. ] --- .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ “Optimal” is the key word here. Before Fisher, statisticians didn’t really understand estimation. The same can be said now about prediction. Despite their impressive performance on a raft of test problems, it might still be possible to do much better than neural nets, deep learning, random forests, and boosting — or perhaps they are coming close to some as-yet unknown theoretical minimum. ] --- ## • There is no way to know what model will be best without trying it out* ## • There is no way to know what hyperparameters will be available without choosing a package* ## • There is no way to know what hyperparameter values will be best without fitting models* <div style="font-size: 150%"> *Though with practice you can make better guesses </div>