ML model for validating IBAN account numbers.
This is an open-source project which delivers a machine learning model for validating IBAN account numbers accessible via gRPC and REST APIs.
The trained model is distributed in two forms:
The project source code is available at bsantanna/iban-validator-model Github repository.
One deliverable of this project is a Docker Image which can be used following this project license.
Disclaimer:
- The dataset may be outdated in comparison to the latest IBAN registry / country information.
- This model was created for case study purposes and may predict incorrect results.
- Use the model at your own discretion.
To run this image with Docker use the following command:
$ docker run -it --rm -p 41151:41151 -p 8080:8080 bsantanna/iban-validator-model
After some moments a container should spawn two processes:
Apple Silicon users:
- To run the docker image use this alternative tag:
bsantanna/iban-validator-model:aarch64
- To run the notebooks of this repository, consider following these instructions.
With example IBAN accounts from https://bank.codes.
IBAN to validation data JSON:
$ curl -s \
"http://localhost:8080/validation?iban=BR1800000000141455123924100C2"
Gives the following result output:
{
"classification": {
"bban_regex": "[0-9]{23}[A-Z]{1}[A-Za-z0-9]{1}",
"check_digit_regex": "[0-9][0-9]",
"code": "BR",
"country": "Brazil",
"size": 29
},
"description": "IBAN passed validation",
"iban": "BR1800000000141455123924100C2",
"is_valid": true
}
Main motivation for creating this project was study and self development in the subjects of Deep Learning and Artificial Neural Networks using Tensorflow, a popular Machine Learning Platform.
Considering the Artificial Intelligence domain landscape in the year of 2022, several Machine Learning SaaS and PaaS offers available in the market in a field of computer science that just had become a mainstream topic. It just made sense to me picking one major framework and start practicing with a well known problem such as IBAN validation.
As a software engineer, with this project I had found answers to some of my practical questions and was rewarded with proficiency in modeling neural networks and distributing them for prediction at scale using cloud containers.
Looking for an idea for a hands-on/short-lived project to practice and learn Neural Network modelling with Keras and TensorFlow, I came across this idea of creating this IBAN validator as this is a simple use case with good references over the internet.
In order to employ a simple yet efficient Machine Learning Model, the proposed solution addresses the challenge using the following approach.
- The Machine Learning Model should memorize a table with country specific Regular Expression rules formatted as static JSON document strings.
- Predict the correct JSON when other items from the table are given as features, extracted from the input IBAN; 2-letter ISO 3166-1 country code and length / size.
- Use language specific regular expression to parse predicted JSON and validate the input IBAN
Project development environment and dependencies:
Trained model was constructed using Keras Functional API.
As per illustration above, two input parameters are given to the model:
In relation to hyper-parameters, the following configuration was used during training:
The output prediction returns an array of probabilities with the maximum probability corresponding to the correct JSON document.
Resulting training process can be observed in the following chart:
In relation to the model fitting, in this use case over-fitting is not an undesired side effect but rather a requirement, the model need to adapt to the specific tabular data and “memorize” it.
Specific details of the model and Neural Network Topology can be observed in the notebook JSON Prediction Model, which served as main development and experiment environment.
A prediction example which loads trained model can be observed in the notebook JSON Classification
While Jupyter notebooks are great for prototyping purposes, in order to distribute the model code was formatted in a continuous delivery ready structure under the modules/
directory and that also permitted introduction of simple use case of integration using Java and gRPC.
Training module contains code used for declaring, compiling and training the model.
The most important files are:
The training process can be performed using the following command (from modules/training directory):
$ python3 json_classification_model.py
Prediction module contains system integration and service interface declaration.
The gRPC service serves as main integration point as it creates a Remote Procedure Call pointing to trained model Prediction.
The gRPC server process can be started using the following command (from modules/prediction/service directory):
$ python3 json_classification_service.py
If all dependencies and pre-conditions are met, the model should be loaded into memory and gRPC server should start listening for requests on port 41151
The service implements the server side of the following gRPC / Protobuf contract:
syntax = "proto3";
service JSONClassificationService {
rpc getPrediction(InputFeatures) returns (OutputLabel) {}
}
message InputFeatures {
string iban = 1;
}
message OutputLabel {
string json = 1;
}
getPrediction
which receives an InputFeatures
object and returns an OutputLabel
object.InputFeatures
object contains a single attribute iban
with type string.OutputLabel
object contains a single attribute json
with type string.A HTTP REST API is another deliverable of this project, it was created to simulate system integration scenario with gRPC client for the same contract served by the gRPC Service.
The following endpoints are available:
/json-prediction
: Returns raw json classification predicted by model without regex validation/validation
: Returns validation and embedded predicted classificationBoth endpoints accept a single query parameter iban
Assuming a Java Development Kit is available, there is a maven project under modules/prediction/rest-api
, which can be built and executed using the following commands:
$ cd modules/prediction/rest-api
$ mvn clean install
$ java -jar target/rest-api.jar
See usage example for a quick reference.
The project reached its original goal of designing and implementing an Artificial Neural Network for validating IBAN account numbers.
The following items can be considered project deliverables:
As a possible future improvement, multiple models could be produced to move part of the algorithm which performs validation from runtime to model compilation time.
As a closing note, the following resources served as references for this project:
Copyright 2022 Bruno César Brito Sant’Anna
Change log is organized in chronological reverse order.
Distributed under the Apache License 2.0. See LICENSE for more information.