Classification Wine Quality¶

Overview¶

This assignment focuses on implementing an end-to-end classification model for wine quality prediction.

The scope of this work includes both the model development phase and the model deployment perspective, covering:

Building, training, and evaluating classification models using the standard scikit-learn workflow
Understanding the Decision Tree (CART) algorithm, including how splits are selected, how impurity is minimized, and why trees are interpretable
Exporting models to ONNX and serving them in a framework-agnostic production environment

Methodology¶

The assignment is to build a classification model that predicts the quality of wine from the north of Portugal.

Attribute	Value
Model name	`classification-model-2026`
Version	`latest`
Purpose	Model wine quality based on physicochemical tests
Task Type	Classification

Dataset¶

The dataset named Wine Quality is available from UCI Machine Learning Repository with the schema of dataset:

Variable Name	Role	Type	Description	Units	Missing Values
quality	Target	Integer	Sensory quality score assigned by wine experts	score (0–10)	no
fixed_acidity	Feature	Continuous	Amount of non-volatile acids in the wine that do not evaporate easily	g/dm³	no
volatile_acidity	Feature	Continuous	Amount of acetic acid in the wine; high values lead to vinegar taste	g/dm³	no
citric_acid	Feature	Continuous	Amount of citric acid, which adds freshness and flavor	g/dm³	no
residual_sugar	Feature	Continuous	Amount of sugar remaining after fermentation stops	g/dm³	no
chlorides	Feature	Continuous	Amount of salt in the wine	g/dm³	no
free_sulfur_dioxide	Feature	Continuous	Free form of SO₂; prevents microbial growth and oxidation	mg/dm³	no
total_sulfur_dioxide	Feature	Continuous	Total amount of SO₂ (free + bound forms)	mg/dm³	no
density	Feature	Continuous	Density of the wine, influenced by alcohol and sugar content	g/cm³	no
pH	Feature	Continuous	Measure of acidity or basicity of the wine	unitless	no
sulphates	Feature	Continuous	Amount of potassium sulphate, contributing to wine stability	g/dm³	no
alcohol	Feature	Continuous	Alcohol content of the wine	% vol	no
color	Other	Categorical	Wine color category	red / white	no

At the first look at the Wine Quality dataset, it is evident that the data form a small but complete dataset, consisting of approximately 6,500 observations, 12 physicochemical features, and one target variable (quality). No missing values are present across any variables.

The data were collected from expert sensory evaluations of both red and white wines, where each wine sample was assigned a quality score ranging from 0 to 10. The features are entirely numeric and describe measurable chemical properties, making the dataset well-suited for classical machine learning models on structured tabular data.

Table 1: Descriptive statistics of the Wine Quality dataset

In more detail, the following EDA results that promote a better understanding of the dataset:

Density by componentCorrelation of featuresPairs with quality

Model Architecture¶

A Decision Tree (CART) architecture was selected as the baseline model due to its strong interpretability and natural fit for small-scale, structured tabular datasets. The dataset contains fewer than 10,000 samples and approximately 10 numeric physicochemical features, for which tree-based splits can directly model non-linear thresholds without requiring feature scaling or complex preprocessing.

The model produces explicit decision rules that enable transparent reasoning about quality predictions and facilitate rapid error analysis. Although single decision trees do not reach state-of-the-art accuracy compared to ensemble methods, they provide fast training, low-latency inference, and clear failure-mode visibility, making them well-suited for baseline modeling and explainability-driven evaluation prior to more complex architectures.

The Decision Tree Model¶

Let the training data be

\[ \{(x_i, y_i)\}_{i=1}^n, \quad x_i \in \mathbb{R}^p, \quad y_i \in \{1,\dots,K\} \]

The objective is to partition the predictor space into \(M\) disjoint regions

\[ R_1, R_2, \dots, R_M \]

and to assign a class label and corresponding class probabilities to each region.

For an observation \(x \in R_m\), the estimated class probabilities are defined as

\[ \hat{p}_{mk} = \frac{1}{|R_m|} \sum_{x_i \in R_m} \mathbf{1}(y_i = k), \quad k = 1,\dots,K \]

The predicted class is given by

\[ \hat{y}(x) = \arg\max_k \hat{p}_{mk} \]

At each internal node, CART selects the split that optimizes a node impurity criterion. In practice, three criteria are commonly used: Gini impurity, Entropy, and Log Loss. Each criterion measures class heterogeneity within a node and leads to slightly different tree behaviors.

Criterion	Mathematical Definition	Optimization Objective	Characteristics
Gini	\(G(R_m) = 1 - \sum_{k=1}^K \hat{p}_{mk}^2\)	Minimize weighted impurity	Fast, stable, favors dominant classes
Entropy	\(H(R_m) = -\sum_{k=1}^K \hat{p}_{mk} \log \hat{p}_{mk}\)	Maximize information gain	More sensitive to class balance
Log Loss	\(L(R_m) = -\frac{1}{R_m}\sum{i \in R_m} \log \hat{p}{m,y_i}\)	Minimize negative log-likelihood	Optimizes probabilistic accurac

For a candidate split into regions \(R_1\) and \(R_2\), the selected criterion \(C(\cdot)\) is minimized according to

\[ \frac{|R_1|}{|R|} C(R_1) + \frac{|R_2|}{|R|} C(R_2) \]

The tree is constructed using a greedy, recursive binary splitting procedure. For a given node \(R\), all possible splits are considered. For each predictor \(x_j\) and split point \(s\), define the regions

\[ R_1(j,s) = \{x : x_j < s\}, \quad R_2(j,s) = \{x : x_j \ge s\} \]

The optimal split \((j^\*, s^\*)\) is chosen to minimize the weighted impurity

\[ (j^\*, s^\*) = \arg\min_{j,s} \left[ \frac{|R_1|}{|R|} G(R_1) + \frac{|R_2|}{|R|} G(R_2) \right] \]

Equivalently, this choice maximizes the impurity reduction

\[ \Delta G = G(R) - \left[ \frac{|R_1|}{|R|} G(R_1) + \frac{|R_2|}{|R|} G(R_2) \right] \]

Once the optimal split is selected, the node \(R\) is partitioned into the two child regions \(R_1(j^\*, s^\*)\) and \(R_2(j^\*, s^\*)\). This procedure is applied recursively to each resulting node until a stopping criterion is met, such as reaching a maximum tree depth, insufficient samples in a node, no further impurity reduction, or a node containing observations from only a single class.

For a terminal region \(R_m\), class probabilities are estimated as

\[ \hat{P}(Y = k \mid X \in R_m) = \frac{1}{|R_m|} \sum_{x_i \in R_m} \mathbf{1}(y_i = k) \]

and the predicted class assigned to the region is

\[ \hat{y}_{R_m} = \arg\max_k \hat{p}_{mk} \]

To control model complexity, cost–complexity pruning may be applied. Let \(T\) denote a subtree with \(|T|\) terminal nodes. The penalized empirical risk is defined as

\[ R_\alpha(T) = \sum_{m=1}^{|T|} \sum_{x_i \in R_m} \mathbf{1}(y_i \ne \hat{y}_{R_m}) + \alpha |T| \]

For a given \(\alpha \ge 0\), the optimal subtree is chosen as

\[ T_\alpha = \arg\min_T R_\alpha(T) \]

Model selection¶

The CART model is selected based on the following criteria:

Field	Value
Model Type	Decision Tree (CART)
Role	Baseline
Task	Multi-class classification
Input Data	Small tabular dataset, numeric features
Target	Wine quality score

based on the rationales:

Dataset size is small, favoring low-variance, interpretable models
Features are fully numeric, requiring minimal preprocessing
Decision trees provide transparent decision rules and fast iteration
Serves as a strong interpretability-focused baseline before ensembles

with the target learning objective

Aspect	Description
Split Criterion	Gini impurity
Optimization Goal	Minimize node impurity

and pre-chosen hyperparameters as follow:

Parameter	Value
criterion	`gini`
splitter	`best`
max_depth	30
min_samples_split	5
min_samples_leaf	3
min_weight_fraction_leaf	0.1
max_features	5
max_leaf_nodes	20
min_impurity_decrease	0
class_weight	None
random_state	999

The original wine quality score (0–10) is transformed into a three-class classification target—LOW, MEDIUM, and HIGH—to reframe the problem as classification, reduce noise from minor expert score variations, align with common quality thresholds, and improve interpretability and stability for a small dataset.

Quality Score Range	Category
≤ 6.0	LOW
(6.0, 8.5]	MEDIUM
> 8.5	HIGH

Then the model is trained on the training set, and the performance is evaluated on the validation set. The model is evaluated exclusively using classification accuracy, defined as the proportion of correctly classified samples. Accuracy is used as the sole criterion for model selection, validation, and final evaluation.

Dataset Splitting Strategy¶

The splitting process is performed in two stages. First, 10% of the full dataset is reserved as a test set to provide an unbiased estimate of generalization performance. The remaining 90% (temporary dataset) is then split into training (80%) and validation (20%) subsets, corresponding to 72% and 18% of the original data, respectively.

# Split
# -----
# Full dataset                   % in total     Description
# ├── Test (10%)                 10%            → final unbiased evaluation (touch once)
# └── Temp (90%)                                → temporal dataset
#       ├── Train (80%)          72%            → model fitting
#       └── Validation (20%)     28%            → tuning / early stopping

Stratified sampling is applied based on the target variable to preserve the original class distribution across all splits.

End-to-End Modeling Flow (Conceptual)¶

(1) Raw wine samples (red and white) are combined into a single dataset.

(2) The dataset is split into train, validation, and test subsets following the above proportions.

(3) Feature preprocessing is applied consistently across all splits:

Categorical features (wine color) are encoded.
Numeric physicochemical features are passed through without transformation.

(4) A Decision Tree (CART) classifier is trained using the training set.

(5) Model hyperparameters are evaluated using the validation set.

(6) The final selected model is evaluated once on the held-out test set.

(7) Evaluation

Results¶

Model Metadata:

Field	Value
Model Name	`classification-model-2026`
Model Type	Classification
Algorithm	Decision Tree (CART)
Owner	Thuyet Bao
Alias	Baseline
Evaluation Metric	Accuracy

Model Performance:

The final model is selected based on validation accuracy and evaluated once on a held-out test set to estimate generalization performance.

Dataset Split	Metric	Score
Validation	Accuracy	79.2%
Test	Accuracy	80%

Serving Model¶

After training, the model is exported to the ONNX format. ONNX (Open Neural Network Exchange) is a standardized, framework-agnostic representation for machine learning models, designed to enable consistent deployment across different environments.

The ONNX has extension .onnx and uses the Protobuf serialization format. An ONNX file encapsulates:

The computation graph (nodes and edges)
Operators (e.g., MatMul, Relu, Softmax, TreeEnsembleClassifier)
Model parameters (weights and tensors)
Input and output schemas (names, shapes, data types)

This standardized representation allows models trained in one framework to be executed reliably in different runtimes without retraining or code changes.

Once exported, the ONNX model follows a unified production serving pipeline:

flowchart LR
  train["Training Model"] --> export[Export to ONNX] --> runtime[ONNX Runtime] --> Serving[API / Batch / Edge / Mobile]

Training The model is developed and trained using a supported framework such as PyTorch, TensorFlow, or scikit-learn.
Export to ONNX The trained model is serialized into a .onnx file, preserving its structure, parameters, and input/output definitions.
ONNX Runtime The model is loaded and executed using ONNX Runtime, which provides optimized inference across CPUs, GPUs, and hardware accelerators.
Serving The model is deployed for inference in multiple scenarios, including:
Online APIs for real-time predictions
Batch processing pipelines
Edge or mobile environments

This workflow decouples model training from deployment, enabling scalable and portable production inference.

Implementation¶

There are 3 sepearted stages for this implementation:

Ind	Stage	Description
1	Download datasets	Download datasets from data registry of UCI
2	Train, export model	Training model then export into ONNX format
3	Serving model	Serving model through HTTP by ONNX runtime

Stages¶

Stage 1: Download datasets¶

The dataset can bedownload by following flow:

(1) Declare the metadata, including: href, folder.

(2) Download the dataset by its zip content. If the .zip file exists, then skip.

(3) Unzip the dataset into the folder.

For the declaration variables:

constant.py

#!/bin/python3

# Global
import os
import sys

# Path Append
sys.path.append(os.path.abspath(os.curdir))

# The experiment name
EXPERIMENT_NAME = "classification-model-2026"

# The container data directory path
DATA_DIR = "/mnt/usr/inference-service/"

# The prefix of project folder
PROJECT_PREFIX_FOLDER = "wine-quality"

# The prefix of project data folder
PROJECT_DATA_FOLDER_PATH = os.path.join(DATA_DIR, PROJECT_PREFIX_FOLDER, "dataset")

# The dictionary of datasets
DATASETS = {
    "wine-quality": {
        "folder": "wine-quality/dataset",
        "href": "https://archive.ics.uci.edu/static/public/186/wine+quality.zip"
    }
}

# The path of dataset
DATASET_COMPONENT = {
    "red": os.path.join(PROJECT_DATA_FOLDER_PATH, "winequality-red.csv"),
    "white": os.path.join(PROJECT_DATA_FOLDER_PATH, "winequality-white.csv"),
}

# The experiment folder
EXPERIMENT_FOLDER = os.path.join(DATA_DIR, PROJECT_PREFIX_FOLDER, EXPERIMENT_NAME)

The following script will be executed:

download.py

#!/bin/python3

# Global
import sys
import os
import zipfile
import argparse
import textwrap

# Path Append
sys.path.append(os.path.abspath(os.curdir))

# External
import httpx
import structlog

# Internal
import workshop.classification_wine_quality.constant as constant

# Set
LOG: structlog.stdlib.BoundLogger = structlog.get_logger()


if __name__ == "__main__":

    parser = argparse.ArgumentParser(
        prog="python workshop/classification_wine_quality/download.py",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        description=textwrap.dedent("""
        [Workshop] Download wine quality datasets

        Usage
        -----

        Normal download
        >>> python workshop/classification_wine_quality/download.py

        Help
        >>> python workshop/classification_wine_quality/download.py --help
        """),
        epilog="Copyright (c) of Thuyet Bao",
    )
    parameters = parser.parse_args()

    # The datasets to download
    TARGET_DATASETS = ["wine-quality"]

    # Load
    for dataset in TARGET_DATASETS:

        # Get
        structlog.contextvars.bind_contextvars(dataset=dataset)
        element = constant.DATASETS[dataset]

        # Build
        dataset_path = os.path.join(constant.DATA_DIR, element["folder"])
        os.makedirs(dataset_path, exist_ok=True)
        LOG.info(f"Prepair the dataset at path={dataset_path}")

        # Download
        LOG.info("Download dataset by zip and unzip content")
        zip_url = element["href"]
        local_zip_path = os.path.join(dataset_path, os.path.basename(zip_url))

        # Validate
        if os.path.exists(local_zip_path):
            LOG.info(f"[CACHED] Zip already exists at path={local_zip_path}")
            pass

        else:

            # Download
            resp = httpx.get(zip_url, follow_redirects=True)
            resp.raise_for_status()
            with open(local_zip_path, "wb") as _file:
                _file.write(resp.content)

        # Extract (overrides if exist)
        with zipfile.ZipFile(local_zip_path, "r") as zip_ref:
            zip_ref.extractall(path=dataset_path)

        LOG.info(f"Successfully downloaded dataset={dataset}")

Stage 2: Training model and export into ONNX format¶

train.py

#!/bin/python3

# Global
import sys
import os
import argparse
import textwrap
import json
from datetime import datetime
import zoneinfo

# Path Append
sys.path.append(os.path.abspath(os.curdir))

# External
import polars as pl
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import tree
from skl2onnx import to_onnx
from skl2onnx.common.data_types import FloatTensorType, StringTensorType
import structlog

# Internal
import workshop.classification_wine_quality.constant as constant

# Set
LOG: structlog.stdlib.BoundLogger = structlog.get_logger()


def build_dataset(resource: dict[str, str]) -> pl.DataFrame:
    result: pl.DataFrame = None
    for key, path in resource:

        if not os.path.exists(path):
            raise FileNotFoundError(f"File not found: {path}")

        element = pl.read_csv(
            source=path,
            has_header=True,
            separator=";",
            infer_schema_length=10_000,
            schema={
                "fixed acidity": pl.Float64,
                "volatile acidity": pl.Float64,
                "citric acid": pl.Float64,
                "residual sugar": pl.Float64,
                "chlorides": pl.Float64,
                "free sulfur dioxide": pl.Float64,
                "total sulfur dioxide": pl.Float64,
                "density": pl.Float64,
                "pH": pl.Float64,
                "sulphates": pl.Float64,
                "alcohol": pl.Float64,
                "quality": pl.Float64,
            }
        ).select(pl.all().name.map(lambda col: col.lower().replace(" ", "_")))
        element = element.with_columns(pl.lit(key).cast(pl.String).alias("color"))

        if result is None:
            result = element
        else:
            result = pl.concat([result, element], how="diagonal")

    return result


if __name__ == "__main__":

    parser = argparse.ArgumentParser(
        prog="python workshop/classification_wine_quality/train.py",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        description=textwrap.dedent("""
        [Workshop] Train classification model define quality of wine

        Usage
        -----

        Normal train
        >>> python workshop/classification_wine_quality/train.py

        Help
        >>> python workshop/classification_wine_quality/train.py --help
        """),
        epilog="Copyright (c) of Thuyet Bao",
    )
    parameters = parser.parse_args()

    # -------------------------------------------------------------------------
    # Training models ---------------------------------------------------------
    # -------------------------------------------------------------------------

    # Prepair
    LOG.info(f"Experiment name: {constant.EXPERIMENT_NAME}")
    os.makedirs(name=constant.EXPERIMENT_FOLDER, exist_ok=True)

    # Set
    metadata = {
        "experiment": constant.EXPERIMENT_NAME,
        "owner": "thuyetbao",
        "version": "1.2.8",
        "revision": datetime.now(tz=zoneinfo.ZoneInfo("Asia/Ho_Chi_Minh")).strftime("%Y%m%d"),
    }
    experiment_model_onnx_path = os.path.join(constant.EXPERIMENT_FOLDER, "model-latest.onnx")
    experiment_model_metadata_path = os.path.join(constant.EXPERIMENT_FOLDER, "metadata.json")

    # Build
    dataset = build_dataset(resource=constant.DATASET_COMPONENT.items())
    dataset = dataset.with_columns(
        pl.col("color").cast(pl.Categorical).name.keep(),
        pl.col("quality").cut(
            breaks=[6, 8.5],
            labels=["LOW", "MEDIUM", "HIGH"],
            left_closed=False,
            include_breaks=False
        ).alias("category"),
    ).drop(["quality"])

    # Set
    col_factor = "category"
    col_features_numeric = [
        "fixed_acidity",
        "volatile_acidity",
        "citric_acid",
        "residual_sugar",
        "chlorides",
        "free_sulfur_dioxide",
        "total_sulfur_dioxide",
        "density",
        "ph",
        "sulphates",
        "alcohol",
    ]
    col_features_category = [
        "color",
    ]
    col_features = col_features_numeric + col_features_category

    # The frame
    frame = dataset.to_pandas()

    # Split
    # -----
    # Full dataset                   % in total     Description
    # ├── Test (10%)                 10%            → final unbiased evaluation (touch once)
    # └── Temp (90%)                                → temporal dataset
    #       ├── Train (80%)          72%            → model fitting
    #       └── Validation (20%)     28%            → tuning / early stopping
    x_temp, x_test, y_temp, y_test = train_test_split(
        frame.drop("category", axis=1),
        frame["category"],
        test_size=0.1,
        random_state=42,
        stratify=frame["color"]
    )
    x_train, x_valid, y_train, y_valid = train_test_split(
        x_temp,
        y_temp,
        train_size=0.2,
        random_state=42,
        stratify=y_temp
    )

    preprocess = ColumnTransformer(
        transformers=[
            ("cat", OneHotEncoder(handle_unknown="ignore", drop="if_binary"), ["color"]),
            ("num", "passthrough", col_features_numeric),
        ]
    )

    # Build
    model = tree.DecisionTreeClassifier(
        criterion="gini",
        splitter="best",
        max_depth=30,
        min_samples_split=5,
        min_samples_leaf=3,
        min_weight_fraction_leaf=0.1,
        max_features=5,
        random_state=999,
        max_leaf_nodes=20,
        min_impurity_decrease=0,
        class_weight=None,
    )
    pipe = Pipeline(
        steps=[
            ("preprocess", preprocess),
            ("model", model)
        ],
        verbose=True
    )
    metadata = metadata | {"algorithm": "Decision Tree (CART)", "alias": "baseline"}

    # Train
    pipe.fit(x_train, y_train)
    # tree.plot_tree(pipe)

    # Evaluate
    accuracy_valid = accuracy_score(y_valid, pipe.predict(x_valid))
    accuracy_test = accuracy_score(y_test, pipe.predict(x_test))
    LOG.info(f"[Performance] Accuracy: (+) Validation set: {accuracy_valid:4f} (+) Test set: {accuracy_test:4f}")

    # -------------------------------------------------------------------------
    # Export model into ONNX format -------------------------------------------
    # -------------------------------------------------------------------------

    # Build
    initial_types = []

    # Numeric features → float tensor
    for col in col_features_numeric:
        initial_types.append((col, FloatTensorType([None, 1])))

    # Categorical features → string tensor
    for col in col_features_category:
        initial_types.append((col, StringTensorType([None, 1])))

    # Convert into ONNX format.
    onx = to_onnx(
        model=pipe,
        name="classification-wine-quality-2026",
        initial_types=initial_types
    )
    with open(experiment_model_onnx_path, "wb") as file:
        file.write(onx.SerializeToString())

    # Write metadata
    with open(experiment_model_metadata_path, "w") as file:
        json.dump(metadata, file)

After training, the data folder will contain artifacts:

Stage 3: Serving model through HTTP by ONNX runtime¶

(1) Build the route /inferences/wine/quality that handle inference request

The inference request payload is contain the properites of wine.

class WineAttributesPayload(BaseModel):
    """Payload of physicochemical tests

   Input variables (based on physicochemical tests):
   1 - fixed acidity
   2 - volatile acidity
   3 - citric acid
   4 - residual sugar
   5 - chlorides
   6 - free sulfur dioxide
   7 - total sulfur dioxide
   8 - density
   9 - pH
   10 - sulphates
   11 - alcohol
   12 - type of wine (red or white)

    Reference
    ---------
    For more information, read [Cortez et al., 2009].
    """
    color: Literal["red", "white"] = Field(default=..., description="Fixed acidity")
    fixed_acidity: float = Field(default=..., description="Fixed acidity")
    volatile_acidity: float = Field(default=..., description="Volatile acidity")
    citric_acid: float = Field(default=..., description="Citric acid")
    residual_sugar: float = Field(default=..., description="Residual sugar")
    chlorides: float = Field(default=..., description="Chlorides")
    free_sulfur_dioxide: float = Field(default=..., description="Free sulfur dioxide")
    total_sulfur_dioxide: float = Field(default=..., description="Total sulfur dioxide")
    density: float = Field(default=..., description="Density")
    ph: float = Field(default=..., description="pH")
    sulphates: float = Field(default=..., description="Sulphates")
    alcohol: float = Field(default=..., description="Alcohol")
    # model_config = ConfigDict(extra="allow")

Then this will be put into the ONNX runtime for inference

class ResponseBaseModel(BaseModel):
    status: str = Field(default=..., description="Status of the request")
    id: str = Field(default_factory=lambda: uuid.uuid4().hex, description="Unique identifier of the request")


class PredictionWineQualityModel(BaseModel):
    type: str = Field(default=..., description="Type of task")
    prediction: str = Field(default=..., description="Quality of wine (Best possible prediction)")
    probabilities: dict[str, float] = Field(default=..., description="Probabilities of prediction per class")

    @field_validator("probabilities", mode="after")
    @classmethod
    def handle_output(cls, value: dict[str, float]):
        ele = {k: round(v, 4) for k, v in value.items()}
        return ele


class ResponsePredictionWineQualityModel(ResponseBaseModel):
    result: PredictionWineQualityModel = Field(default=..., description="Result of the request")

For the route engine:

endpoint/inferences/route.py

#!/bin/python3

# Global

# External
import structlog
from fastapi import (
    APIRouter,
    Depends,
    status,
)
import onnxruntime
import numpy as np

# Internal
import dependencies

# Context
from endpoint.inferences.model import WineAttributesPayload, ResponsePredictionWineQualityModel


router = APIRouter(
    prefix="/inferences/wine",
    tags=["Inference"],
)

# Construct
LOG: structlog.stdlib.BoundLogger = structlog.get_logger()


@router.post(
    path="/quality",
    summary="Inference quality of wine (Classification)",
    description="Inference quality of wine based on the physicochemical properties",
    status_code=status.HTTP_200_OK,
    response_model=ResponsePredictionWineQualityModel,
)
def inferenceClassificationWineQuality(
    payload: WineAttributesPayload,
    model: onnxruntime.InferenceSession = Depends(dependencies.yield_model_classification_wine_quality_latest)
):

    # Echo
    LOG.debug(f"The model payload: {payload.model_dump(mode='python')}")

    # Build
    data = {
        "color": np.array([[payload.color]], dtype=object),
        "fixed_acidity": np.array([[payload.fixed_acidity]], dtype=np.float32),
        "volatile_acidity": np.array([[payload.volatile_acidity]], dtype=np.float32),
        "citric_acid": np.array([[payload.citric_acid]], dtype=np.float32),
        "residual_sugar": np.array([[payload.residual_sugar]], dtype=np.float32),
        "chlorides": np.array([[payload.chlorides]], dtype=np.float32),
        "free_sulfur_dioxide": np.array([[payload.free_sulfur_dioxide]], dtype=np.float32),
        "total_sulfur_dioxide": np.array([[payload.total_sulfur_dioxide]], dtype=np.float32),
        "density": np.array([[payload.density]], dtype=np.float32),
        "ph": np.array([[payload.ph]], dtype=np.float32),
        "sulphates": np.array([[payload.sulphates]], dtype=np.float32),
        "alcohol": np.array([[payload.alcohol]], dtype=np.float32),
    }
    LOG.debug(f"Data: {data}")

    # Predict
    # The output is a list of (label, probabilities)
    # For example:
    # [
    #       array(['MEDIUM'], dtype=object),
    #       [
    #           {
    #               'HIGH': 0.13235294818878174,
    #               'LOW': 0.22794117033481598,
    #               'MEDIUM': 0.6397058963775635
    #           }
    #       ]
    # ]
    output = model.run(output_names=['output_label', 'output_probability'], input_feed=data)
    LOG.debug(f"Output: {output}")

    return {
        "status": "ok",
        "result": {
            "type": "classification",
            "prediction": output[0][0],
            "probabilities": output[1][0]
        }
    }

(2) Then try the inference request

Using the curl command:

curl -X 'POST' \
  'http://127.0.0.1:12123/inferences/wine/quality' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
        "color": "red",
        "fixed_acidity": 0.7,
        "volatile_acidity": 0.0,
        "citric_acid": 1.9,
        "residual_sugar": 0.076,
        "chlorides": 11.0,
        "free_sulfur_dioxide": 34.0,
        "total_sulfur_dioxide": 0.9978,
        "density": 3.51,
        "ph": 0.56,
        "sulphates": 9.4,
        "alcohol": 5.0
    }'
# --- Response body
# {
#   "status": "ok",
#   "id": "0f7e7e34a9564518b447207a81cf4da0",
#   "result": {
#     "type": "classification",
#     "prediction": "LOW",
#     "probabilities": {
#       "HIGH": 0,
#       "LOW": 0.7664,
#       "MEDIUM": 0.2336
#     }
#   }
# }
# --- Response headers
#  access-control-allow-credentials: true
#  access-control-allow-origin: *
#  content-length: 167
#  content-type: application/json
#  x-engine-handle-by: Inference Service
#  x-engine-revision: 20260117
#  x-engine-version: 1.23.41

Citation¶

Citation for the works of the team that shared the dataset:

@misc{wine_quality_186,
  author       = {Cortez, Paulo, Cerdeira, A., Almeida, F., Matos, T., and Reis, J.},
  title        = {{Wine Quality}},
  year         = {2009},
  howpublished = {UCI Machine Learning Repository},
  note         = {{DOI}: https://doi.org/10.24432/C56S3T}
}

Futher Reading¶

For some core concepts related to sklearn, look at sklearn Headstart

Reference¶

Appendix¶

Appendix 1: Record of Changes¶

Table: Record of changes

Version	Date	Author	Description
0.4.2	2026/01/18	thuyetbao	Metadata, result, artifacts
0.3.15	2026/01/18	thuyetbao	Model architecture, EDA and metadata
0.2.9	2026/01/17	thuyetbao	Added stages, citation and references
0.1.0	2026/01/17	thuyetbao	Initiation documentation