+ - 0:00:00
Notes for current slide
Notes for next slide

MLCA Week 10:

Followup

Mike Mahoney

2021-11-02

1 / 10

Project FAQ

2 / 10

Cross-entropy (from week 7 - tuning RF):

calc_cross_entropy <- function(rf_model, data) {
data <- predict(rf_model, data) |>
predictions() |>
cbind(data) |>
mutate(prediction = ifelse(Attrition == "Yes", Yes, No),
# Force prediction to not be exactly 0 or 1
prediction = max(1e-15, min(1 - 1e-15, prediction)),
loss = -log(prediction))
sum(-log(data$prediction))
}

Cross-entropy: negative log of the probability of the correct classification

ranger: provides predictions of both classes

lightgbm: provides predictions of the positive class

3 / 10

So alteration needed:

calc_cross_entropy <- function(lgb_model, data) {
data <- data |>
mutate(predict(lgb_model, data),
# If the correct answer is No, invert the prediction:
prediction = ifelse(Attrition == "Yes",
prediction,
1 - prediction),
# Force prediction to not be exactly 0 or 1
prediction = max(1e-15, min(1 - 1e-15, prediction)),
loss = -log(prediction))
sum(-log(data$prediction))
}
4 / 10

Just because it worked doesn't mean it's working

5 / 10

Just because it worked doesn't mean it's working

5 / 10

Standard citations:

citation()
##
## To cite R in publications use:
##
## R Core Team (2021). R: A language and environment for statistical
## computing. R Foundation for Statistical Computing, Vienna, Austria.
## URL https://www.R-project.org/.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {R: A Language and Environment for Statistical Computing},
## author = {{R Core Team}},
## organization = {R Foundation for Statistical Computing},
## address = {Vienna, Austria},
## year = {2021},
## url = {https://www.R-project.org/},
## }
##
## We have invested a lot of time and effort in creating R, please cite it
## when using it for data analysis. See also 'citation("pkgname")' for
## citing R packages.
6 / 10

Other standard cites

citation("ranger") # RF
citation("lightgbm") # GBM
citation("kernlab") # SVM
citation("caret") # KNN
citation("rpart") # Decision trees

Breiman, L., 2001. Random Forests. Machine Learning 45, 5–32. https://doi.org/10.1023/A:1010933404324

Friedman, J. H., 2002. Stochastic Gradient Boosting. Computational Statistics & Data Analysis 38(4), 367-378. https://doi.org/10.1016/S0167-9473(01)00065-2

Cortes, C., Vapnik, V. Support-vector networks. Machine Learning 20, 273–297 (1995). https://doi.org/10.1007/BF00994018

7 / 10

Other notes:

Don't force-install packages (it's rude)

Don't cite LM, GLM

Pay attention to the rubric (if it isn't there, it isn't worth points)

8 / 10

Project re-submit

Optional!

I make no promises about turnaround time (but measured in days, not hours.)

No resubmission after 2021-12-08.

First project has been finished!

9 / 10

Status Update

One more week of content (SVM)

Three "work weeks" (bring questions)

Presentations 2021-12-08

10 / 10

Project FAQ

2 / 10
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow