This is a quick walkthrough of the basics of the using the Nexosis API. By following this walkthrough which uses a sample dataset, you will learn all of the steps needed to find anomalies using the Nexosis API.
Data can be uploaded by posting the rows and columns as JSON, or, as a CSV file. We have several DataSets available which includes a file named cardio.json which contains some heartbeat data that help identify abnormal signals. That DataSet includes data like this:
"data": [
{
"X.1": 0.00491231466176838,
"X.2": 0.693190774919386,
"X.3": -0.203640485666361,
"X.4": 0.595322119977384,
"X.5": 0.353189609125384,
"X.6": -0.0614006444954018,
"X.7": -0.278294946006147,
"X.8": -1.65044441802655,
"X.9": 0.759072460103087,
"X.10": -0.420487347915062,
"X.11": 0.37214915553696,
"X.12": 1.48597291930713,
"X.13": -0.798376447055488,
"X.14": 1.85472761207619,
"X.15": 0.622631032861473,
"X.16": 0.963082541396573,
"X.17": 0.301464348018426,
"X.18": 0.193113439105929,
"X.19": 0.231497949713972,
"X.20": -0.28978574486636,
"X.21": -0.493293969440142,
"target": 0,
"key": 1
},
...
There are 3 types of columns here. All of the X.# columns are the measurements which make up the features of the dataset. The target column identifies whether or not the record is an anomaly. The metadata included in the json file is going to mark this column to be ignored. If you include this column as a target then you can create a classification model. In this case we’re going to try and detect anomalies without it. Finally, there is a key column which we’re going to identify as the key in the metadata so we can match up anomalous rows with the original dataset if desired.
You can execute all of the following code in a terminal window based on the anomalies gist from the Nexosis GitHub account. Be sure and add your own api key and to check your version of Python as the gist adds some JSON results parsing not represented below.
To do this, we need to first upload the data so it becomes a DataSet
. Then we start a Session
, referencing the DataSet
we just created and containing parameters needed to determine how the Nexosis machine learning algorithms should work. Once the Session
is started, our algorithms will start crunching the numbers to produce a model.
Putting this all together, we will have a two requests that look like the ones below. Make sure to replace the {subscription key}
section with your actual subscription key.
curl -s -X POST "https://ml.nexosis.com/v1/imports/url" \
-H "Content-Type: application/json" \
-H "api-key: {subscription key}" \
-d '{"dataSetName":"cardio","url":"https://raw.githubusercontent.com/Nexosis/sampledata/master/cardio.json"}'
curl -s -X POST "https://ml.nexosis.com/v1/sessions/model" \
-H "Content-Type: application/json" \
-H "api-key: {subscription key}" \
-d '{"dataSourceName": "cardio", "predictionDomain": "anomalies"}'
Once the session has been started, you should see a response similar to this:
{
"columns": {
... columns elided...
},
"sessionId": "0160ea7b-3306-4a58-9655-8075306c7d96",
"type": "model",
"status": "requested",
"predictionDomain": "anomalies",
"availablePredictionIntervals": [],
"requestedDate": "2018-01-12T13:07:10.72398+00:00",
"statusHistory": [
{
"date": "2018-01-12T13:07:10.72398+00:00",
"status": "requested"
}
],
"extraParameters": {
"containsAnomalies": true
},
"messages": [],
"dataSourceName": "cardio",
"dataSetName": "cardio",
"targetColumn": "anomaly",
... other information elided...
}
Here we can see that we have a sessionId
, which we will need later on. Also, the status
of the session is now requested
. The parameters that we sent up before are also echoed back to us. Now that we have requested a session, we can check the status to see when it completes by sending a GET with the sessionId
we just got.
# replace {sessionId} from the response to your session request
curl -v -X GET "https://ml.nexosis.com/v1/sessions/{sessionId}" \
-H "api-key: $apiKey"
Once this request comes back with a status
of completed
two things will be available: first, we will be able to get the anomalies found by requesting the results; second, we will now have a model that can identify anomaly or not on future data points.
These anomalies will be returned as part of the “data” array property on the results response. We can get that by making a GET request to the results endpoint:
# replace {sessionId} from the response to your session request
curl -s -H "api-key: {subscription key}" https://ml.nexosis.com/v1/sessions/{sessionId}/results
The results JSON contains a data array with each anomalous observation containing a new field named ‘anomaly’. This field contains a score of the observation record as anomalous. When this result is a negative number, the record is anomalous. In this case all scores are negative because we have returned only anomalous entries. The score is a relative rank of how anomalous, such that a more negative number is more anomalous than another record with a score closer to zero.
{
"metrics": {
"percentAnomalies": 0.10049153468050245
},
"data": [
{
"key": "141",
"anomaly": "-0.0452571326093725",
"X.1": "0.00491231466176838",
"X.2": "2.25671813545889",
"X.3": "-0.203640485666361",
"X.4": "-0.757128570390479",
... additional values and entries ...
}
],
... other information elided...
}
In order to predict you first need the modelId for the model trained by the session in step 2. This comes back in the session results
curl -s -X GET "https://ml.nexosis.com/v1/sessions/{sessionId}/results" \
-H "api-key: {subscription key}"
This response will look a lot like the other above but importantly has the field modelId.
{
"sessionId": "0160ea7b-3306-4a58-9655-8075306c7d96",
"type": "model",
"status": "completed",
"predictionDomain": "anomalies",
"availablePredictionIntervals": [],
"modelId": "10777ba7-d3f4-4da7-9405-21e53b758832",
"requestedDate": "2018-01-12T13:07:10.72398+00:00",
...
}
With the model id and a set of new values for the features you’re ready to request a prediction.
curl -s -H "api-key: $apiKey" \
-X POST $baseUrl/models/10777ba7-d3f4-4da7-9405-21e53b758832/predict \
-H "Content-Type: application/json"
-d '{ "data": [{"X.1": -1.15907509242824,"X.2": -0.91998843744384,"X.3": -0.178808273342188,"X.4": 0.0119005653086114,"X.5": 3.14482390535516,"X.6": 17.314053637174,"X.7": -0.278294946006147,"X.8": 1.28100338235996,"X.9": 0.759072460103087,"X.10": -0.420487347915062,"X.11": -1.39808792464813,"X.12": 0.744634905731968,"X.13": -1.26165666571131,"X.14": -0.400329665815939,"X.15": 0.278624569020388,"X.16": 0.963082541396573,"X.17": -4.14708829524105,"X.18": -3.19083518334924,"X.19": -3.21686477340904,"X.20": 2.58264294083969,"X.21": -2.12660546680886,"key": 1780}]}'
Just as with the data in the session result, the body of the response will include your data echoed back to you but this time with the anomaly field filled in:
{
"data": [
{
"X.1": "-1.15907509242824",
"X.2": "-0.91998843744384",
"X.3": "-0.178808273342188",
"X.4": "0.0119005653086114",
"X.5": "3.14482390535516",
"X.6": "17.314053637174",
"X.7": "-0.278294946006147",
"X.8": "1.28100338235996",
"X.9": "0.759072460103087",
"X.10": "-0.420487347915062",
"X.11": "-1.39808792464813",
"X.12": "0.744634905731968",
"X.13": "-1.26165666571131",
"X.14": "-0.400329665815939",
"X.15": "0.278624569020388",
"X.16": "0.963082541396573",
"X.17": "-4.14708829524105",
"X.18": "-3.19083518334924",
"X.19": "-3.21686477340904",
"X.20": "2.58264294083969",
"X.21": "-2.12660546680886",
"key": "1780",
"anomaly": "-0.101920780676883"
}
],
...
}
Now that you are familiar with the basics, try getting predictions from new datasets, or, take a look at the code samples and client libraries, and write an application which integrates with the API. Show us what you were able to build!