The data is not labeled, and the machine tries to find groups or outliers in the data using data characteristics.
Clustering
The clustering model will be trained on unlabeled data and will try to find different groups of instances (clusters). Later those clusters can be used to detect anomalies or even to label the data!
Because the clustering algorithms are not biased to any label, we can get a “pure” model on the data behavior. If the clustering model can produce excellent and informative clusters, we can use them to investigate the data and have a better understanding of it. We might also find a correlation with the label, which can help us label the data.
Types of clustering algorithms:

Centroid based clustering
The algorithm partitions the data into K clusters by trying to minimize the sum of the distance function of each instance in the cluster to the center of the cluster.
Algorithms: K-means

Density-based clustering
The algorithm constructs clusters based on how closely the instances are packed together compared to their neighbors
Algorithms: OPTICS, DBSCAN (Density-based spatial clustering of applications with noise)

Connectivity based clustering – Hierarchical
The algorithm finds clusters based on previously generated clusters resulting in a dendrogram – cluster tree
Agglomerative (bottom-up)– starts with each instance as a separate cluster
Divisive (top-down) – starts with all the data as a cluster
Algorithms: SLINK, CLINK

Distribution based clustering
The algorithm generates the clusters upon instances that share similar distribution.
For example, EM – expectation–maximization:
The algorithm iterates with steps:
- Expectation step – estimate the probability for each instance to be in a cluster using likelihood function based on the attributes
- Maximization step – calculate the parameters of each cluster (mean)
- Test – between each iteration a test is applied to find the difference (error) in each cluster if the error is bigger than previous iterations the algorithm halts
Algorithms: Expectation maximization
Outlier
An outlier is an instance with high distance from the other instances
Type of outlier algorithms:
Distance-based outlier
The algorithm finds outliers that are far from their neighbors and have a less dense neighborhood, while normal instances have a dense neighborhood
Algorithms: KNN based, ROF

Depth based outlier
The algorithm finds outliers that are in the border of the data space (normal in the center)
Algorithms: ISODEPTH, FDC

Deviation based outlier
The algorithm finds outliers that are in the outer points of the space, if these points are removed the data variance is reduced significantly.

Density-based outlier
The algorithm finds outliers that have a relatively different density point compared
Algorithms: LOF, LOCI

High dimensional outlier
The algorithm finds outliers for very sparse data with many attributes where almost all the instances are outliers. Therefore a neighbor approach is not relevant.
example:
ABOD – angle-based outlier degree – state that an outlier is a point with similar direction to other points and a normal point has a different direction to different points
Other Algorithms: ABOD, grid-based, SOD

Association rules
Discover rules and patterns in an item-set data. Each instance is a set of items; our goal is to find a connection between the items and extract rules. The Apriori principle is used to discover these rules, the principal state that a subset of frequent instances and values must be frequent. The Apriori logic can be used as follows: start and find frequent 1 item set in the data (=most frequent items in the data) ({A}, {B}, {D}). then find frequent 2 item set based on the items in the previous step ({A, B}, {B, D}), iteratively add K+1 item-set based on the previous ones ({A, B, D}). Then generate rules based on the item-sets: A, B -> D | D, A->B | B, D -> A

- https://en.wikipedia.org/wiki/List_of_machine_learning_concepts#Association_rule_learninghttps://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/Expectation_Maximization_(EM)
- http://sqlblog.com/blogs/dejan_sarka/archive/2015/05/12/data-mining-algorithms-em-clustering.aspx
- http://www.mit.edu/~9.54/fall14/slides/Class13.pdf
- https://www.siam.org/meetings/sdm10/tutorial3.pdf