The main difference between data scientists and analysts, developers, and security researchers is the knowledge of how to use machine learning (ML) to deal with cyber threats.
There are two types of models for describing a phenomenon and building a model for it:
- Mechanical model – derived from understanding the physical rules in the system
- Empirical model – derived from the observation of the phenomena, without understanding how it works.
In machine learning, we let the machine “write its program”, while in traditional programming, we give the computer the code and the data input and get an output. However, in the machine learning world, we provide the computer some data (i.e network traffic) and output (i.e attack / legitimate) and it learns a program that finds the underlying patterns to solve the task.
Developing a machine learning algorithm seems complicated. but by taking a top-down approach, we can eliminate many of the complexity.
A top-down approach starts from the problem, define a task, and then choose a model which is relevant to solve the problem.
After having a model, we choose relevant algorithms and execute them in a grey/black-box approach.
The data scientist needs to understand how the algorithm works generally, but not to dive into its implementation because they are already implemented in many open source libraries such as Scikit-learn (Python) or R.