How It Works: Regression


The Process

Regression allows you to estimate the relationships between variables. Classification allows you to organize information into categories. Anomaly Detection helps you find outliers in your data.

The Nexosis API allows you to upload a dataset and it will model these relationships. Once the relationship is understood this model is persisted and used to further predict values or classify your data, given new inputs. Take a moment to familiarize yourself with the high-level process before using Regression and Classification.

We’ve worked hard to keep the high-level process simple. Here’s the basic process:

  1. Submit a dataset
  2. Start a model building session
  3. Retrieve results

Then optionally:

  1. Use the Model API endpoint to make predictions if you liked the results of the model.
  2. Update Dataset with additional new data and rebuild, or train new model.
  3. Start a new Session. Repeat.
[How It Works - Regression and Classification]

Submit a dataset

Regression

Regression is a process by which the Nexosis API, through the analysis of a particular dataset, will attempt to understand the relationship a series of variables have with one specific target value given many different sets of observed data. This relationship can then be used to predict a new target value given any combination of known new variables.

For example, if you wanted to be able to predict an approximate sale price of a house you’d need a dataset containing real attributes of many different kinds of houses and their real historical sale price. Each row describes a distinct house - its columns represent specific attributes (independent variables) that likely influence the sale price (dependent variable). These independent variables are called features and the dependent variable is called the target. Some examples of attributes that might correspond to the sale price could be the year a house was built, the number of rooms, the number of bathrooms, livable square footage, zip code, and so on.

The DataSet

Once the dataset has been submitted, a regression or classification session can be created to build a model.

Read Sending Data for the technical details.

Start a model building session

A Session is simply the process of building the model using the supplied Dataset. This exploration of the data is computationally expensive and can be time consuming depending on the amount of data in the dataset.

This is where the data science happens at scale. Behind the scenes a host of algorithms will work to discover what makes your dataset tick, attempting to find what factors are influential to others, where the correlations are and ultimately provide predictions given new data inputs.

Read Sessions for the technical details.

Retrieve the results

Once the all the results are analyzed and the relationships present are discovered, a model is built and deployed. The SessionResult will contain a modelId used to identify the production model endpoint where predictions get made. Additionally, the session result will also returns metrics to illuminate the strength of the relationships that were found between the features and the target value in the form of an accuracy metric in the dataset.

Read Retrieving a Session for more technical details.

Use Model API Endpoint

Once the model is deployed and you like there results, it becomes your prediction endpoint. By simply sending in new variables - or series of variables - a set of predictions can be made.

Over time you may collect more data that can help improve the model, or you could add additional variables to the dataset allowing even better predictions in the future. Simply upload more data and create a new session and get new results! Each new session will create a new model with a new modelId, so you don’t have to worry about your current model getting clobbered.

Further reading

Read Prediction Quick Start for an end-to-end example where you can see how the whole process works with Regression.

Read Classification Quick Start for an end-to-end example to see how the Classification process works.

Read Anomaly Detection Quick Start for an end-to-end example to see how the Anomaly Detection process works.