Machine Learning Concepts and Applications

Slides, notes, and syllabus for FOR 796.

Course Title and Instructor

Title: FOR 796: Machine Learning Concepts and Applications
Time and Place: Wednesday 2:15-3:10, Bray 300

Instructor: Mike Mahoney
Email:
Office Hours: Tuesday 11:30-12:30 (Baker 231), or by appointment

Course Description

Prediction has taken over the world. Whether it’s predicting how species will adapt to a changing climate, which plant cultivars will hold their own against non-native diseases, or what pair of pants a customer is most likely to buy, prediction – and the algorithms built to predict – have come to dominate how we work and live in an incredibly short span of time. These predictive methods have seen their popularity skyrocket throughout the sciences in recent years, as it becomes increasingly important to not just understand how systems work but to communicate how they might respond to human activity moving forward.

The algorithms developed to enable all this prediction – referred to as “pure prediction algorithms” or, more loosely, “machine learning” – have seen massive success across domains. But this success comes at the cost of complexity, and implementing these techniques requires new tools and a different mindset than traditional statistical modeling as taught to most professionals.

This course attempts to help bridge that gap, guiding students through several of the most common machine learning approaches at a conceptual level with a focus on applications in R. Topics include the random forest and gradient boosting machine algorithms as well as cross validation and loss functions.

Course Structure

Each week there will be a handout to read before class. You are encouraged to type the code from the handouts into your own R session, in order to develop familiarity with the syntax and develop muscle memory for common tasks.

Class time on Wednesday will be a discussion format, dedicated to answering any questions from the reading. Office hours will be available weekly to help debug code problems or answer more specific questions.

The course culminates in a final project where students apply concepts from the course to a data set of their choosing. The final week of class will be spent presenting results from these projects.

Learning Objectives

By the end of this course, students will be able to:

Prerequisites

Students must have some familiarity with the R language, including defining functions, managing objects, controlling the flow of a program (e.g. if/else statements and for-loops), wrangling data in data frames, fitting linear models (i.e., the lm function) and other basic tasks.

An introductory stats course is recommended.

Textbooks and Materials

This class draws heavily from materials presented in the following two books. Both books are freely available online and you do not need to purchase a physical copy of either book to succeed in this class; there are no assigned readings from these books.

  1. Bradley Boehmke and Brandon Greenwell. (2020). Hands on Machine Learning with R. CRC Press. Available online at https://bradleyboehmke.github.io/HOML/ .
  2. Bradley Efron and Trevor Hastie. (2016). Computer Age Statistical Inference. Cambridge University Press. Available online at https://web.stanford.edu/~hastie/CASI_files/PDF/casi.pdf .

Students will need to have access to a computer they are able to install R packages on.

While not used in the class, students looking for a better grasp on the R programming language may benefit from the following textbooks (all freely available online):

  1. Hadley Wickham and Garrett Grolemund. (2017). R for Data Science. O’Reilly Media, Inc. Available online at https://r4ds.had.co.nz/ .
  2. Hadley Wickham. (2019). Advanced R. CRC Press. Available online at https://adv-r.hadley.nz/ .
  3. Hadley Wickham and Jenny Bryan. (2021). R Packages. O’Reilly Media, Inc. Available online at https://r-pkgs.org/

In addition, I highly recommend the book An Introduction to Statistical Learning. While not used directly in this course, this book is potentially “the” ML textbook, and provides a comprehensible introduction to ML methods. This book is also available freely online; a citation is:

Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. (2021). An Introduction to Statistical Learning with Applications in R. Springer Texts, second edition. Available online at https://www.statlearning.com/

Grading

Grades will be assigned based upon the final project. There are no other assignments in this course.

Students with Learning and Physical Disabilities

SUNY-ESF works with the Office of Disability Services (ODS) at Syracuse University, who is responsible for coordinating disability-related accommodations. Students can contact ODS at 804 University Avenue- Room 309, 315-443-4498 to schedule an appointment and discuss their needs and the process for requesting accommodations. Students may also contact the ESF Office of Student Affairs, 110 Bray Hall, 315-470-6660 for assistance with the process. To learn more about ODS, visit http://disabilityservices.syr.edu. Students who attempt to use accommodations without advance notice to faculty will be referred to the ESF Office of the Dean for Student Affairs. Since accommodations may require early planning and generally are not provided retroactively, please contact ODS as soon as possible.

Academic Dishonesty

Academic dishonesty is a breach of trust between a student, one’s fellow students, or the instructor(s). By registering for courses at ESF you acknowledge your awareness of the ESF Code of Student Conduct (http://www.esf.edu/students/handbook/StudentHB.05.pdf ), in particular academic dishonesty includes but is not limited to plagiarism and cheating, and other forms of academic misconduct. The Academic Integrity Handbook contains further information and guidance (http://www.esf.edu/students/integrity/). Infractions of the academic integrity code may lead to academic penalties as per the ESF Grading Policy (http://www.esf.edu/provost/policies/documents/GradingPolicy.11.12.2013.pdf).

Inclusive Excellence Statement

As an institution, we embrace inclusive excellence and the strengths of a diverse and inclusive community. During classroom discussions, we may be challenged by ideas different from our lived experiences and cultures. Understanding individual differences and broader social differences will deepen our understanding of each other and the world around us. In this course, all people (including but not limited to, people of all races, ethnicities, sexual orientation, gender, gender identity and expression, students undergoing transition, religions, ages, abilities, socioeconomic backgrounds, veteran status, regions and nationalities, intellectual perspectives and political persuasion) are strongly encouraged to respectfully share their unique perspectives and experiences. This statement is intended to help cultivate a respectful environment, and it should not be used in a way that limits expression or restricts academic freedom at ESF.