What is machine learning resampling

Machine learning in R

In everyday life, in the media and in business, the terms artificial intelligence and supervised machine learning are often equated. Strictly speaking, from a scientific point of view, monitored machine learning is only a much smaller sub-area of ​​artificial intelligence, since artificial intelligence also includes other research areas such as robotics and computer vision.

This course introduces algorithms and general concepts of supervised machine learning that are particularly suitable for modeling non-linear relationships for complex classification and regression problems. The basic principles of the presented algorithms and concepts are explained in an understandable way for beginners, their functionality is illustrated and the advantages and disadvantages are discussed. All algorithms and topics introduced are illustrated using practical examples and use cases and practiced by participants with exercises.

The course uses the R extension package mlr3: Machine Learning in R, which the team at Essential Data Science Training GmbH has been developing for years.

Main topics part 1: Introduction to machine learning and predictive modeling

Course participants should be familiarized with the most important concepts and terms of machine learning, as well as training and evaluating the first simple, monitored learning models. The following topics are covered:

  • General questions in machine learning (regression, classification, clustering, ...)

  • Introduction of general terms (loss function, risk minimization, overfitting, hyper and model parameters, training and test data, ...)

  • Linear and Logistic Regression from a Machine Learning Perspective

  • K-nearest neighbor method

  • Important evaluation measures for regression and classification and their properties

  • Resampling methods (cross validation, bootstrap, ...) and their advantages and disadvantages

Main topics part 2: Practical machine learning - evaluation and tuning

In this part of the course, various supervised machine learning algorithms are introduced. The advantages and disadvantages of the models are discussed. In addition, further advanced concepts for (monitored) machine learning are taught in order to be able to solve practical problems better and more efficiently. The following topics are covered:

  • Hyperparameter optimization (random search and grid search)

  • Nested cross validation for model selection

  • Advanced evaluation and analysis of classification algorithms (confusion matrix, ROC curves)

  • Other monitored models in machine learning: regression and classification trees, random forests, outlook on (gradient) boosting

Main topics part 3: machine learning pipelines, data preprocessing and feature engineering

The course participants get to know practical methods to solve common problems and challenges in data and to apply complicated machine learning pipelines. The following topics are covered:

  • Simple preprocessing methods (e.g. identify and remove constant and duplicated features)

  • Feature transformations (scaling, centering, ...)

  • Handling of categorical features (dummy and impact coding)

  • Missing values ​​and imputation

  • Unbalanced data (over / undersampling)

  • Outlook: automatic machine learning

The following R packets are handled:

  • mlr3, mlr3verse

  • kknn, rpart, randomForest, ranger, gbm, xgboost

Requirements:

  • Knowledge of R (within the scope of the 2-day R basic course or 1-day R crash course)

  • General basic understanding of data analysis / statistics.

General course information:

  • Course language: German, Course materials: English.

  • Included: digital course material and certificate of participation.

  • Use one Laptop / pc with reliable internet access and install the following software for

  • Webinar tool: Webinars come withzoomcarried out what by means of Breakout sessions (separate virtual course rooms) simplify group work and individual support during hands-on tasks. If the installation and use of the Zoom software is not permitted in your company, participation in the Webinar via internet browser (i.e. without additional zoom software).

Notes on the registration process:

  • With a non-binding pre-registration, you will initially only receive a confirmation of receipt (without invoice). Only when there are at least 4 pre-registrations (no later than 10 days before the start of the course) will you receive another email to confirm your pre-registration.

  • You will then receive an invoice from our payment service provider “Xing Events”, which can be paid within 30 days.

  • If at least 4 participants have not confirmed their registration bindingly up to 7 days before the start of the course, the course can be canceled (see Section 5, Paragraph 1 of our General Terms and Conditions). In the event of cancellation, course fees already paid will be fully reimbursed.