Quantum Feature Importance Selection

6 min readMar 26, 2023

From Grossi et al., 2022, Fig. 5 [1]

Introduction

Recently I was reproducing the paper written by Grossi et al., 2022 about fraud detection. It was a fascinating problem because it was quantum oriented with a hybrid approach. It was a real-world application of a quantum algorithm.

The authors don’t provide the code or the data due to confidentiality (IBM project). There is no data or code, so the best challenge is developing new knowledge. The approach through the paper is simple, using undersampling to reduce the dataset’s size and balance it. Then classical methods are used as a baseline to benchmark a quantum support vector classifier (QSVC).

This work was done at PinQ2 (Plateforme d’innovation Quantique du Québec). A special thanks to Sean Wagner, Alexandre Choquette from IBM quantum and Jérôme Schmaltz for their help and review during this project.

Aim

The current state of quantum computing doesn’t allow using many features (columns) and samples. The hardware and the number of qubits limits us. The authors tackle the problem by using classical features’ importance extracted through different classical methods (XGBoost, GBT, RF). It’s also a standard way in quantum machine learning to reduce the size of the dataset. However, and I fully agree, the authors argue that classical features’ importance destroyed interesting information that a quantum algorithm can use.

The authors developed a quantum approach based on the features’ importance selection using a QSVC. They provide the schema of their system (Figure 5 of the paper). See below:

Figure 5: Quantum Feature Importance Selection process, Grossi et al., 2022

The literature on quantum feature selection isn’t significantly developed, so I jumped on the challenge of reproducing the algorithm. As they described in the paper, only the kernel is computed on a quantum computer (or simulator).

How to interpret the schema?

Steps of the feature selection:
1 — initialization -> select the 3 first features of the data
2 — run QSVC -> with circuit runner or primitives
3 — evaluation with specific metrics (predict the test to evaluate the model)
4 — record KPI and feature ids
5 — if there is more feature, select other features in permutation without repetition
6 — start with the best set of features; add one feature
7 — repeat 2, 3, 4, 5
8 — compare the KPI with one feature added to the KPI obtained with the previous number of features
9 — if there is no improvement, stop the algorithm and output the KPI and ids of the best features
10 — if improvement, repeat 6, 7, 8, 9, 10

Strictly speaking, you will evaluate many kernels in the process. Why?

The first step is about unique permutations of 3 features on the full features.

7 features lead to 35 unique permutations
11 features lead to 165 unique permutations
33 features lead to 5456 unique permutations
69 features lead to 52394 unique permutations

It was highly expensive. Don’t hope to run that on a real system. The KPIs are metrics to evaluate the kernel: precision, recall, accuracy… it’s up to you.

Code implementation

I will provide my implementation of the QFIS algorithm. I used Qiskit with IBMRuntime primitives and the local sampler of primitives to use simulators. I reproduced the 2 steps based on the description in the paper and Figure 5.

QFIS approach @the author

Three functions constitute the code above. The second is the execution function, where the kernel is transpiled into a circuit to be run on the backend with the different options. You can see the parameter env in the function declaration, meaning that the function can be used in your local machine when the parameter is set to local and in the cloud cloud parameter. The function uses two Sampler versions, one for the cloud and one for local.

The advantage of primitive is to use error mitigation. This method corrects the information in the qubit through the process to be more accurate. However, the problem is the waiting time with the IBMQ platform when you use the cloud option. Each session will be positioned in the queue, and even ibmq_qasm_simulator from the cloud, the computation can take hours rather than minutes with the local simulator.

Run the function

You split the data into train and test sets through each machine learning project. So what are the parameters?

X is the train set
y is the array of labels of the train set
X_test is the test set
y_test is the array of labels of the test set
backend is the backend where you want to run the algorithm (real system or simulator)
options are parameters for runtime primitive
service is mainly to load your account via runtime
max_features is the maximum number of features you want to select (if none, the algorithm will try all the combinations until it doesn’t improve during permutations)
n_reps is the number of repetitions for the ZZFeatureMap
env to determine where to run the algorithm, local or cloud

You just need these parameters to execute the function and wait until the process is done.

# simulator on local machine 
_features_, results = quantum_feature_importance_local(X_ibm_train_qfis, y_ibm_train_qfis, X_ibm_test_qfis, y_ibm_test_qfis, backend=backend, 
                               options=options, service=service, max_features=7)

Here I specified 7 features as the maximum number of features. I ran the code on a local machine with the `qasm_simulator` from BasicAer module (Qiskit).

Output

The QFIS algorithm output prints the best result for a combination. Here are used balanced accuracy and AUC as evaluation metrics. The model reaches 80% accuracy with the set (0, 1, 4). Adding a fourth feature didn’t improve this score, so the algorithm stopped.

Cloud?

The next code block will show you how to use the QFIS. You will define the service and options and backend.

from qiskit_ibm_runtime import QiskitRuntimeService, Options

# service 
service = QiskitRuntimeService(channel="ibm_quantum", token="...") # put your token here

# Options
options = Options(optimization_level=3)
options.resilience_level = 3 # 1
options.execution.shots = 8192

# backend 
backend = service.provider('ibmq_qasm_simulator')
# simulator on local machine 
_features_, results = quantum_feature_importance_local(X_train, y_train, X_test, y_test, backend=backend, 
                               options=options, service=service, max_features=7, env='cloud')

You can use all the providers you want through the backend parameter. It can be the qasm_simulator or real devices.

Computation time evaluation

I made many iterations with different backends, different sample sizes, and different numbers of features. In the backend section (Table below) is a real device, ibmq_montreal, a cloud simulator ibmq_qasm_simulator, and a local simulator, qasm_simulator. The interesting part is the training time for each test. The sets (11, 165), (33, 5490), (69, 52394) are computational estimated times (525600 minutes correspond to 1 year).

As you can see, using the QFIS on a local simulator is easier. The queue on the IBMQ platform can’t allow this type of approach.

Conclusion

Quantum feature importance or selection is critical for quantum algorithms. Using classical reduction methods, feature importance or feature selection can kill the performance or advantage of a quantum algorithm how the data will be selected or reduced needs to be adapted to the algorithm. Your preprocess must be quantum if you want to use a quantum algorithm.

Evaluating the importance of features with the quantum approach proposed in the paper requires an expensive computation time. To reduce this effect, a local quantum simulator helps iterate quickly. The final can be trained on a real device, but the selective process is better to run on a local machine with a GPU in single precision.

Another way, is to use the Session to run multi-kernel evaluations. The advantage is when the Session is open on real hardware, you have a time slot of 8 hours. So you can run multiple algorithms during this period.

Reference

[1] Grossi et al., 2022, Mixed Quantum-Classical Method For Fraud Detection with Quantum Feature Selection — https://arxiv.org/pdf/2208.07963.pdf