This is a quick walkthrough of the basics of the using the Nexosis API. By following this walkthrough which uses a sample dataset, you will learn all of the steps needed to predict classes using the Nexosis API.
Data can be uploaded by posting the rows and columns as JSON, or, as a CSV file. We have several DataSets available which includes the famous iris data with some identifiable features of different species of Iris flowers. The Iris JSON DataSet includes data like this (json shown):
[
{
"sepal_len": 5.1,
"sepal_width": 3.5,
"petal_len": 1.4,
"petal_width": 0.2,
"iris": "setosa"
},
... more instances of iris features
]
The iris field is our target value. The other columns will be treated as features which the Nexosis API can run algorithms against. Note that string valued targets are preserved when building a classification model though that wouldn’t make sense with other model types.
To do this, we need to first send our data to a DataSet
. Then we start a Session
, referencing the DataSet
we just created and containing parameters needed to determine how the Nexosis machine learning algorithms should work. Once the Session
is started, our algorithms will start crunching the numbers to produce a model.
For this DataSet, we want to classify iris type given some measurements of the flower in centimeters. The TargetColumn
parameter will then be specified as iris.
Putting this all together, we will have a two requests that look like the ones below. Make sure to replace the {subscription key}
section with your actual subscription key, and replace the file path with the path to one of the sample files that was downloaded earlier.
If you’re using the iris.json
file from our sampleData repo: Iris JSON DataSet. Make sure to submit the correct Content-Type
header.
curl -v -X PUT "https://ml.nexosis.com/v1/data/iris" \
-H "Content-Type: application/json" \
-H "api-key: {subscription key}" \
--data-binary "@/path/to/file/iris.json"
If you’re using the iris.csv
file from our sampleData repo: Iris CSV DataSet. Make sure to submit the correct Content-Type
header.
curl -v -X PUT "https://ml.nexosis.com/v1/data/iris" \
-H "Content-Type: text/csv" \
-H "api-key: {subscription key}" \
--data-binary "@/path/to/file/iris.csv"
curl -v -X POST "https://ml.nexosis.com/v1/sessions/model" \
-H "Content-Type: application/json" \
-H "api-key: {subscription key}" \
-d '{"dataSourceName": "iris", "predictionDomain": "classification", "targetColumn": "iris"}'
Once the session has been started, you should see a response similar to this:
{
"columns": {
... columns elided...
},
"sessionId": "015fdb7d-7161-49a1-9ac7-6cee6818842d",
"type": "model",
"status": "requested",
"predictionDomain": "classification",
"balance": true,
"requestedDate": "2017-11-20T22:00:21.361294+00:00",
"statusHistory": [
{
"date": "2017-11-20T22:00:21.361294+00:00",
"status": "requested"
}
],
"extraParameters": {},
"messages": [],
"dataSourceName": "iris",
"dataSetName": "iris",
"targetColumn": "iris",
... other information elided...
}
Here we can see that we have a sessionId
, which we will need later on. Also, the status
of the session is now requested
. The parameters that we sent up before are also echoed back to us. Now that we have requested a session, we can check the status to see when it completes by sending a GET with the sessionId
we just got.
We can also see the balance
parameter, which is only valid on classification sessions. This can be set when submitting a new session and defaults to true
. A balanced dataset is one with roughly equal representation of all classes. If this does not hold then the dataset is unbalanced. A classification algorithm will tend to perform better on the classes with majority representation simply because it gets many more examples to learn from. In order to mitigate this you can request that the API “balance” (in this case a verb) the classification model. Behind the scenes the model will assign more weight to the minority classes during training. This setting will often lead to improved performance on the minority classes at the expense of reduced performance on the majority classes. There is no free lunch.
curl -v -X GET "https://ml.nexosis.com/v1/sessions/{sessionId}" \
-H "api-key: {subscription key}"
Once this request comes back with a status
of completed
, the model will be available for us to use in prediction request.
In order to predict you first need the modelId for the model trained by the session in step 2. This comes back in the session results
curl -v -X GET "https://ml.nexosis.com/v1/sessions/{sessionId}/results" \
-H "api-key: {subscription key}"
This response will look a lot like the other above but importantly has the field modelId.
{
"sessionId": "015fdb7d-7161-49a1-9ac7-6cee6818842d",
"type": "model",
"status": "completed",
"predictionDomain": "regression",
"modelId": "8faca9ec-1e43-4400-a85c-8c2af5186dd5",
"requestedDate": "2017-11-20T22:00:21.361294+00:00"
}
With the model id and a set of new values for the features you’re ready to request a prediction.
curl -v -X POST "https://ml.nexosis.com/v1/models/{modelId}/predict" \
-H "api-key: {subscription key}" \
-H "Content-Type: application/json" \
-d '{"data":[{ "sepal_len": "5.1", "sepal_width": "3.5", "petal_len": "1.4", "petal_width": "0.2"}] }'
The body of the response will include your data field echoed back to you but this time with the predicted class filled in.
{
"data": [
{
"sepal_len": "5.1",
"sepal_width": "3.5",
"petal_len": "1.4",
"petal_width": "0.2",
"iris": "setosa"
}
],
...
}
Now that you are familiar with the basics, try getting predictions from new datasets, or, take a look at the code samples and client libraries, and write an application which integrates with the API. Show us what you were able to build!