Supervised Learning Using Azure Machine Learning

October 09, 2017

Machine learning and analytics of big data is a rapidly growing field, but learning how to analyze these items has had a steep learning curve, until now. Machine learning is nothing new, and has used a set of standard algorithms for evaluating datasets have not changed. With cloud computing and cheap storage it is now possible to take advantage of these algorithms and apply it to large data sets. Microsoft has created a visual tool for creating a experiments using machine learning techniques to evaluate the data, and take out the hurdles that once stopped users from utilizing machine learning.

Classification is a type of supervised learning where a set of features is evaluated then classified based on what those features represent. The more features and items included in the dataset, the more accurate the comparison. In this example, the classification technique will be used to train and evaluate the datasets to create a functioning web service that uses machine learning.

Classification Example

Feature 1

Feature 2

Feature 3

Feature 4

Classification

X1

X1

X1

X1

Y1

X2

X2

X2

X2

Y2

 

Setup Project

To begin, a new project needs to be setup. A project will classify the models, experiments, and web services for referencing later. A project does not tie down a data set or model and which can be used interchangeably between projects. To begin, select the Projects tab on the left, then select new at the bottom of the screen.

Image001

Then select empty project from the list of project.

Image002

A modal will appear where a name can be added to project and add a description. Once the project is named click the check, the project is created.

Image003

Clean data

For this example, classifications of different Iris species will be used to train the model. The models were obtained from archive.ics.uci.edu for training, and the test set is from Wikipedia. Both data sets can be found can be found below.

Before training the model, the data must be cleaned so that the test and training data contain the same feature headings and species class name. Once the headings in the data are changed, the classification/species names need to be checked. It is important that the training set and test set both use the same names for classification; otherwise it will produce 6 classifications and the wrong species returned in the results.

 

Training set

Sepal Length

Sepal Width

Petal Length

Petal Width

Class

4.6

3.6

1

0.2

Iris-setosa

4.3

3

1.1

0.1

Iris-setosa

 

Testing before cleanup

Sepal length

Sepal width

Petal length

Petal width

Species

5.1

3.5

1.4

0.2

I. setosa

4.9

3.0

1.4

0.2

I. setosa

 

Testing after cleanup

Sepal Length

Sepal Width

Petal Length

Petal Width

Class

5.1

3.5

1.4

0.2

Iris-setosa

4.9

3

1.4

0.2

Iris-setosa

 

Upload datasets

Now that the data is cleaned, the datasets can be upload to the project for testing. To do this, go to the datasets tab and click new.

Image004

From the menu, select From Local File

Image005

Upload each dataset, and enter a name.

Image006

Create experiment

An experiment is where the datasets will train and evaluated to view the accuracy before publishing the results to a web service. To start a new experiment, click the experiment tab and select new at the bottom.

Image007

To begin, select blank experiment to create a blank canvas to evaluate the models.

Image008

When completed, the screen should now look like this.

Image009

To start the experiment, data is needed to evaluate. Drag the saved datasets from the “Saved Datasets” section onto page.

Image010

The next step is to train a model. This can be found in the machine learning tab. Expand the machine learning tab which will give the options for the model. Since the first step is to train the model, expand the training section and drag the Train model action to the page.

Image011

Notice the 3 connection points, and a red exclamation mark. The train model action is looking for a dataset and a type of model to initialize the training. The other connection point is the output of the trained model. In order to train the model, an initialization model must be selected.  The initialization model represents the type of machine learning technique that will be used to train the model. Since this is a classification model, expand classification. Within the classification section there will be multiple choices. Since the datasets have more than two features, a multiclass action should be used. For this example, the multiclass Neural Network will be the initial model used for training. Drag the Multiclass Neural Network to the page and connect it and the data source to the train model action.  

Image012

Even after connecting the training technique and dataset to the model, there is still a red exclamation. This is because the classification name column associated with the dataset has not been assigned to the training model. On the right side of the screen, there is a launch column selector. This is for selecting the column that holds the classification name. Launch the selector, and choose the column named “class”.

Image013

Image014

Now that the training model is setup, the model needs to be scored to make sure it gives an accurate score. To do this expand the “Machine Learning” section and then expand score then drag the Score Model action to the page. Once it is on the page connect the test dataset and the train model to the top of the evaluate model.

Image015

Finally, the model needs to be evaluated. This will give the results of the training and how the predictions are weighted. There is the option to attach an additional dataset. This will allow evaluation against another data set.

Image016

Run experiment

Now that the experiment setup, it is time to run it. Click save and run at the bottom of the experiment.

Image017

A successful training will show all green checks on each stage of the experiment.

Image018

View results

The results of each step can be viewed by right-clicking on the action. In the menu a visualization section will appear to view the results.

Image019

The score view shows up what the model predicted each line item in the test data represented and how strongly it felt its guess was correct.

Image020
Score View

The evaluation view gives an overall accuracy percentage based on the test data, and what guesses were correct and what each guess is weighted.

Image021
Evaluation View

Retrain results

The accuracy was low, 86.66%. This would be unacceptable in most cases. To increase the accuracy there are a couple of things to try.

  1. Add more data to the dataset
  2. Try a different machine learning algorithm.

Within the experiment, the initialization model can be easily changed to a different machine learning algorithm.


From the Machine learning section, expand Initialize Model and expand Classification. From here drag the Multiclass Decision Forest action to the page and connect it where the Multiclass Neural Network was connected to the train model.

Image022

Run experiment again and view results.

Image023

For this experiment the datasets were very small. Only 150 items were used to train the dataset. This is skewing the results and has a large effect on what the outcome is. The more items in the dataset the more accurate the results. If the experiment is 100% accuracy then it is important to review the data and make sure it is clean, and if it is correct then machine learning may not be necessary for data to be evaluated, and a different algorithm might suit the dataset better.

Create a web service

Once the desired results are achieved, the trained model can now be exposed as a web service. At the bottom of the experiment, select Set up web service. And select Predictive Web Service.

Image024

When finished the screen should look something like this

Image025

Before deploying, it is important to run the web service to make sure everything is setup correctly.

Image026

When everything passes, hit deploy web service. When the web service is finished deploying the service will be available on the web service tab. In the web service tab  there are many features including the API Key, and tests for the web service. How to call and consume the web service can also be found by clicking the request/Response section.  

Image027

Image028 Image029 Image030

References

Datasets: http://archive.ics.uci.edu/ml/... (training)

https://en.wikipedia.org/wiki/... (test)

Azure Machine Learning:  https://studio.azureml.net/

Information and material in our blog posts are provided "as is" with no warranties either expressed or implied. Each post is an individual expression of our Sparkies. Should you identify any such content that is harmful, malicious, sensitive or unnecessary, please contact marketing@sparkhound.com.

Meet Sparkhound

Review our capabilities and services, meet the leadership team, see our valued partnerships, and read about the hardware we've earned.

Learn How We Work

See how our Plan/Build/Run methodology drives real client success, and gain our team's perspectives on timely tech topics.

Engage With Us

Get in touch any of our offices, or checkout our open career positions and consider joining Sparkhound's dynamic team.