How It Works: Classification


The Process

Regression allows you to estimate the relationships between variables. Classification allows you to organize information into categories. Anomaly Detection helps you find outliers in your data.

The Nexosis API allows you to upload a dataset and it will model these relationships. Once the relationship is understood this model is persisted and used to further predict values or classify your data, given new inputs. Take a moment to familiarize yourself with the high-level process before using Regression and Classification.

We’ve worked hard to keep the high-level process simple. Here’s the basic process:

  1. Submit a dataset
  2. Start a model building session
  3. Retrieve results

Then optionally:

  1. Use the Model API endpoint to make predictions if you liked the results of the model.
  2. Update Dataset with additional new data and rebuild, or train new model.
  3. Start a new Session. Repeat.
[How It Works - Regression and Classification]

Submit a dataset

Classification

Classification is a process by which the Nexosis API, through the analysis of a particular dataset, will attempt to understand the categories by which you might group the rows of data together.

For example, if you had a variety of measurements of characteristics of an Iris flower that allowed you to determine which species it was - such as sepal length and width, and petal length and width, you could train a Classification model to learn to make this distinction given only these characteristics.

The DataSet

Once the dataset has been submitted, a regression or classification session can be created to build a model.

Read Sending Data for the technical details.

Start a model building session

A Session is simply the process of building the model using the supplied Dataset. This exploration of the data is computationally expensive and can be time consuming depending on the amount of data in the dataset.

This is where the data science happens at scale. Behind the scenes a host of algorithms will work to discover what makes your dataset tick, attempting to find what factors are influential to others, where the correlations are and ultimately provide predictions given new data inputs.

Read Sessions for the technical details.

Retrieve the results

Once the all the results are analyzed and the relationships present are discovered, a model is built and deployed. The SessionResult will contain a modelId used to identify the production model endpoint where predictions get made. Additionally, the session result will also returns metrics to illuminate the strength of the relationships that were found between the features and the target value in the form of an accuracy metric in the dataset.

Read Retrieving a Session for more technical details.

Use Model API Endpoint

Once the model is deployed and you like there results, it becomes your prediction endpoint. By simply sending in new variables - or series of variables - a set of predictions can be made.

Over time you may collect more data that can help improve the model, or you could add additional variables to the dataset allowing even better predictions in the future. Simply upload more data and create a new session and get new results! Each new session will create a new model with a new modelId, so you don’t have to worry about your current model getting clobbered.

Further reading

Read Prediction Quick Start for an end-to-end example where you can see how the whole process works with Regression.

Read Classification Quick Start for an end-to-end example to see how the Classification process works.

Read Anomaly Detection Quick Start for an end-to-end example to see how the Anomaly Detection process works.