How It Works: Anomaly Detection


The Process

Regression allows you to estimate the relationships between variables. Classification allows you to organize information into categories. Anomaly Detection helps you find outliers in your data.

The Nexosis API allows you to upload a dataset and it will model these relationships. Once the relationship is understood this model is persisted and used to further predict values or classify your data, given new inputs. Take a moment to familiarize yourself with the high-level process before using Regression and Classification.

We’ve worked hard to keep the high-level process simple. Here’s the basic process:

  1. Submit a dataset
  2. Start a model building session
  3. Retrieve results

Then optionally:

  1. Use the Model API endpoint to make predictions if you liked the results of the model.
  2. Update Dataset with additional new data and rebuild, or train new model.
  3. Start a new Session. Repeat.
[How It Works - Regression and Classification]

Submit a dataset

Anomaly Detection

Anomaly Detection is a process by which the Nexosis API, through the analysis of a particular dataset, will attempt to find observations in your dataset that fall outside of what’s normal inside your dataset. It can then predict if other observations are anomalous, or outliers, using the generated model.

For example, if you had a variety of heart measurements from an EKG (encephalograph) you could determine if some of the signals were outside of the normal range.

The DataSet

Once the dataset has been submitted, a regression or classification session can be created to build a model.

Read Sending Data for the technical details.

Start a model building session

A Session is simply the process of building the model using the supplied Dataset. This exploration of the data is computationally expensive and can be time consuming depending on the amount of data in the dataset.

This is where the data science happens at scale. Behind the scenes a host of algorithms will work to discover what makes your dataset tick, attempting to find what factors are influential to others, where the correlations are and ultimately provide predictions given new data inputs.

Read Sessions for the technical details.

Retrieve the results

Once the all the results are analyzed and the relationships present are discovered, a model is built and deployed. The SessionResult will contain a modelId used to identify the production model endpoint where predictions get made. Additionally, the session result will also returns metrics to illuminate the strength of the relationships that were found between the features and the target value in the form of an accuracy metric in the dataset.

Read Retrieving a Session for more technical details.

Use Model API Endpoint

Once the model is deployed and you like there results, it becomes your prediction endpoint. By simply sending in new variables - or series of variables - a set of predictions can be made.

Over time you may collect more data that can help improve the model, or you could add additional variables to the dataset allowing even better predictions in the future. Simply upload more data and create a new session and get new results! Each new session will create a new model with a new modelId, so you don’t have to worry about your current model getting clobbered.

Further reading

Read Prediction Quick Start for an end-to-end example where you can see how the whole process works with Regression.

Read Classification Quick Start for an end-to-end example to see how the Classification process works.

Read Anomaly Detection Quick Start for an end-to-end example to see how the Anomaly Detection process works.