Training a simple classifier

In this example we will see how to build a simple support vector classifier using Netsaur.

The dataset

We will be classifying two classes from the Iris dataset. The dataset consists of 150 samples of data on three different species of the iris flower. We will be using the first two classes (Iris versicolor, Iris setosa) for our example.

The two-class dataset can be downloaded here.

Loading dependencies

We first need to import the necessary modules from Netsaur:

Model

We will be creating a sequential neural network in this example.

import { Sequential } from "jsr:@denosaurs/netsaur@0.4.0/core";

Layers

Let's import a DenseLayer and a SigmoidLayer for activation.

import {
  DenseLayer,
  SigmoidLayer,
} from "jsr:@denosaurs/netsaur@0.4.0/core/layers";

Utilities

From netsaur/utilities, we need useSplit to split the dataset and ClassificationReport to get model metrics.

import {
  useSplit,
  ClassificationReport,
} from "jsr:@denosaurs/netsaur@0.4.0/utilities";

Misc

Cost to select a cost function for the network
CPU the backend we will be training on
setupBackend to setup the CPU backend
tensor2D to create a dataset for the model

import {
  Cost,
  CPU,
  setupBackend,
  tensor2D,
} from "jsr:@denosaurs/netsaur@0.4.0";

We need the parse function to load CSV data.

import { parse } from "jsr:@std/csv@1.0.3/parse";

Loading the dataset

First, open the dataset file using Deno.readTextFileSync and then parse the text content using the parse function we imported.

const _data = Deno.readTextFileSync("binary_iris.csv");
const data = parse(_data);

Now we can get the predictors (x) and targets (y). The first four columns are the predictors and the fifth column contains the class.

Since we are training a support vector classifier, our outputs are encoded as 1 and -1.

const x = data.map((fl) => fl.slice(0, 4).map(Number));
const y = data.map((fl) => (fl[4] === "Setosa" ? 1 : -1));

Next, we split the dataset for training and testing. The common train:test ratio is 7:3.

const [[trainX, trainY], [testX, testY]] = useSplit(
  { ratio: [7, 3], shuffle: true },
  x,
  y
);

Preparing the model

Let's setup our CPU backend first. This allows Netsaur to load the correct binaries for the desired backend.

await setupBackend(CPU);

Now comes our model. The size parameter defines the input size of your data. For this example, your data will be split into 4 minibatches. The numbers after the first number define the input shape with the exception of the number of samples.

We are setting the silent parameter to false so that the network prints training log to stdout.

Our layer configuration consists of a dense (fully connected) layer with 4 neurons, a sigmoid activation layer, and a dense output layer with 1 neuron. The output layer only has a single neuron because our output is a single binary value.

Finally, we are using the hinge cost function, which is the standard cost function for SVMs.

const net = new Sequential({
  size: [4, trainX[0].length],
  silent: false,
  layers: [
    DenseLayer({ size: [4] }),
    SigmoidLayer(),
    DenseLayer({ size: [1] }),
  ],
  cost: Cost.Hinge,
});

Training the model

Now we train our network for 150 epochs, in 1 batch, with a learning rate of 0.02.

net.train(
  [
    {
      inputs: tensor2D(trainX),
      outputs: tensor2D(trainY.map((x) => [x])),
    },
  ],
  150,
  1,
  0.02
);

Evaluating the model

To evaluate our model, we can use the trained model on the test data.

const res = await net.predict(tensor2D(testX));

Our result will be a 2-dimensional tensor. Tensor.prototype.data is a Float32Array, which we can iterate through.

Now we use a sign function to convert the predicted values into 1, -1.

const y1 = res.data.map((x) => (x < 0 ? -1 : 1));

Finally we can generate a classification report for our evaluation.

const cMatrix = new ClassificationReport(testY, y1);
console.log(cMatrix);

You should get an output like this:

Classification Report
Number of classes:      2
Class   Preci   F1      Rec     Sup
1       1.0000  1.0000  1.0000  17
-1      1.0000  1.0000  1.0000  13
Accuracy                1       30