Substance Spectrogram Classification using QCi's Reservoir Computer

Introduction

We used QCi's EmuCore to build a method for classification of substances based on their spectrograms. The reader can refer to

https://en.wikipedia.org/wiki/Spectroscopy

to learn about spectroscopy of substances. The reader may also refer to an inetersting simulator

https://phet.colorado.edu/sims/html/molecules-and-light/latest/molecules-and-light_en.html

to learm more the about the interaction of electromagnetic waves at different frequencies with different substances.

Methodology

The goal is to build a classification method that classifies substances based on their corresponding spectrograms. Each spectrogram has two dimensions, namely time and frequency. We treat the frequency dimension as input features to the reservoir. The output of the reservoir is then used to build a linear model. The labels are the 14 substances in the dataset. We used 80% of spectrograms, uniformly chosen across the 14 substances, as training set. The rest was used for testing.

Dataset

The dataset consists of spectrograms of 14 substances. For each substance, there is about 650 spectrograms.

Implementation

We start by importing some libraries and define some parameters. A utility function is defined to measure the runtimes.

In [1]:

import os
import sys
import random
import time
from functools import wraps
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import OneHotEncoder
from bumblebee_client.bumblebee_client import BumblebeeClient
DATA_DIR = "/shared/spectro/data"
NUM_SUBSTANCES = 14
NUM_NODES = 2500 # Number of reservoir nodes
TEST_SIZE_RATIO = 0.2 # Ratio of test data
IP_ADDR = "172.18.41.70" # The api address of EmuCore
VBIAS = 0.3 # Bias
GAIN = 0.65 # Gain
FEATURE_SCALING = 0.5 # Scaling coefficient for the input to reservoir
def timer(func):
@wraps(func)
def wrapper(*args, **kwargs):
beg_time = time.time()
val = func(*args, **kwargs)
end_time = time.time()
tot_time = end_time - beg_time
print("Runtime of %s: %0.2f seconds!" % (func.__name__, tot_time,))
return val
return wrapper

Get Spectrograms and Labels

We can then define a function that reads the dataset and splits it into training and testing data.

In [2]:

def get_specs_labels(data_dir=DATA_DIR):
train_spec_list = []
train_label_list = []
train_spec_type_list = []
test_spec_list = []
test_label_list = []
test_spec_type_list = []
for i in range(NUM_SUBSTANCES):
i_str = "{:02n}".format(i + 1)
cache = np.load(
os.path.join(data_dir, "substance%s.npy" % i_str),
allow_pickle=True,
).item()
for item in cache["pos"].keys():
if random.random() > TEST_SIZE_RATIO:
train_spec_list.append(cache["pos"][item])
train_label_list.append(i + 1)
train_spec_type_list.append("pos")
else:
test_spec_list.append(cache["pos"][item])
test_label_list.append(i + 1)
test_spec_type_list.append("pos")
assert len(train_spec_list) == len(
train_label_list
), "Inconsistent sizes!"
assert len(train_spec_list) == len(
train_spec_type_list
), "Inconsistent sizes!"
assert len(test_spec_list) == len(
test_label_list
), "Inconsistent sizes!"
assert len(test_spec_list) == len(
test_spec_type_list
), "Inconsistent sizes!"
return (
train_spec_list,
train_label_list,
train_spec_type_list,
test_spec_list,
test_label_list,
test_spec_type_list,
)

Run through Reservoir

We now define a function that gets the training and testing data and runs each spectrogram through the reservoir. The output of reservoir will be used to build a linear classifier.

In [3]:

@timer
def run_reservoir(train_spec_list, test_spec_list, num_nodes):
num_taps = num_nodes
num_f = train_spec_list[0].shape[1]
for spec in train_spec_list:
spec.shape[1] == num_f, "Inconsistent dimensions!"
for spec in test_spec_list:
spec.shape[1] == num_f, "Inconsistent dimensions!"
# Padding the input vectors with zero before sending them through reservoir
zero_vec = np.zeros((10, num_f))
train_resp_list = []
for spec in train_spec_list:
client = BumblebeeClient(ip_addr=IP_ADDR)
lock_id, start, end = client.wait_for_lock()
client.reservoir_reset(lock_id=lock_id)
client.rc_config(
lock_id=lock_id,
vbias=VBIAS,
gain=GAIN,
num_nodes=num_nodes,
num_taps=num_taps
)
resp, _, _ = client.process_all_data(
input_data=np.concatenate([zero_vec, spec], axis=0),
num_nodes=num_nodes,
density=1,
feature_scaling=FEATURE_SCALING,
lock_id=lock_id,
)
client.release_lock(lock_id=lock_id)
train_resp_list.append(resp)
test_resp_list = []
for spec in test_spec_list:
client = BumblebeeClient(ip_addr=IP_ADDR)
lock_id, start, end = client.wait_for_lock()
client.reservoir_reset(lock_id=lock_id)
client.rc_config(
lock_id=lock_id,
vbias=VBIAS,
gain=GAIN,
num_nodes=num_nodes,
num_taps=num_taps
)
resp, _, _ = client.process_all_data(
input_data=np.concatenate([zero_vec, spec], axis=0),
num_nodes=num_nodes,
density=DENSITY,
feature_scaling=FEATURE_SCALING,
lock_id=lock_id,
)
client.release_lock(lock_id=lock_id)
test_resp_list.append(resp)
return train_resp_list, test_resp_list

Build a Classifier

Finally, we define a function that builds a linear classifier. Note that when "reservoir_flag" is set to False, the reservoir step is skipped and a linear classifier is build based on the raw spectrograms.

In [4]:

@timer
def build_classifier(reservoir_flag=True):
(
train_spec_list,
train_label_list,
train_spec_type_list,
test_spec_list,
test_label_list,
test_spec_type_list,
) = get_specs_labels()
train_resp_list = train_spec_list
test_resp_list = test_spec_list
if reservoir_flag:
train_resp_list, test_resp_list = run_reservoir(
train_spec_list, test_spec_list, NUM_NODES,
)
X_train = np.concatenate(train_resp_list, axis=0)
X_test = np.concatenate(test_resp_list, axis=0)
y_train = []
for i, spec in enumerate(train_resp_list):
tmp_list = [train_label_list[i]] * spec.shape[0]
y_train += tmp_list
y_test = []
for i, spec in enumerate(test_resp_list):
tmp_list = [test_label_list[i]] * spec.shape[0]
y_test += tmp_list
y_train = np.array(y_train)
y_test = np.array(y_test)
enc = OneHotEncoder()
enc.fit(y_train.reshape(-1, 1))
y_train = enc.transform(y_train.reshape(-1, 1)).toarray()
y_test = enc.transform(y_test.reshape(-1, 1)).toarray()
y_train = 2.0 * y_train - 1.0
y_test = 2.0 * y_test - 1.0
assert X_train.shape[0] == y_train.shape[0], "Inconsistent sizes!"
assert X_test.shape[0] == y_test.shape[0], "Inconsistent sizes!"
assert y_train.shape[1] == NUM_SUBSTANCES, "Inconsistent sizes!"
assert y_test.shape[1] == NUM_SUBSTANCES, "Inconsistent sizes!"
clf = LinearRegression(fit_intercept=True)
clf.fit(X_train, y_train)
score = clf.score(X_train, y_train)
print("Regression Score = %f" % (score))
success = 0
count = 0
for i, spec in enumerate(train_resp_list):
count += 1
act_label = int(train_label_list[i])
tmp_vec = clf.predict(spec).mean(axis=0)
tmp_vec = 1 * (tmp_vec == np.amax(tmp_vec))
prd_label = enc.inverse_transform(tmp_vec.reshape(1, -1))[0][0]
if prd_label == act_label:
success += 1
print("Success rate on train data: %0.3f" % (success / count))
success = 0
count = 0
for i, spec in enumerate(test_resp_list):
count += 1
act_label = int(test_label_list[i])
tmp_vec = clf.predict(spec).mean(axis=0)
tmp_vec = 1 * (tmp_vec == np.amax(tmp_vec))
prd_label = enc.inverse_transform(tmp_vec.reshape(1, -1))[0][0]
if prd_label == act_label:
success += 1
print("Success rate on test data: %0.3f" % (success / count))

We can then call the above function to run the process from end to end.

In [ ]:

build_classifier(reservoir_flag=True)

Results

Classifiers were built using different number of reservoir nodes on QCi's EmuCore.

The table below shows the success rates (on both training and testing data) of classifiers built when different numbers of reservoir nodes are used. A "0" number of reservoir nodes corresponds to the case where no reservoir was used and a linear model was trained based on the raw spectrograms.

The next table shows the training times of the classifiers using different numbers of nodes. As can been seen, the training time increases more or less linearly with number of nodes used.