Skip to content

Classification Wine Quality

Overview

This assignment focuses on implementing an end-to-end classification model for wine quality prediction.

The scope of this work includes both the model development phase and the model deployment perspective, covering:

  • Building, training, and evaluating classification models using the standard scikit-learn workflow

  • Understanding the Decision Tree (CART) algorithm, including how splits are selected, how impurity is minimized, and why trees are interpretable

  • Exporting models to ONNX and serving them in a framework-agnostic production environment

Methodology

The assignment is to build a classification model that predicts the quality of wine from the north of Portugal.

Attribute Value
Model name classification-model-2026
Version latest
Purpose Model wine quality based on physicochemical tests
Task Type Classification

Dataset

The dataset named Wine Quality is available from UCI Machine Learning Repository with the schema of dataset:

Variable Name Role Type Description Units Missing Values
quality Target Integer Sensory quality score assigned by wine experts score (0–10) no
fixed_acidity Feature Continuous Amount of non-volatile acids in the wine that do not evaporate easily g/dm³ no
volatile_acidity Feature Continuous Amount of acetic acid in the wine; high values lead to vinegar taste g/dm³ no
citric_acid Feature Continuous Amount of citric acid, which adds freshness and flavor g/dm³ no
residual_sugar Feature Continuous Amount of sugar remaining after fermentation stops g/dm³ no
chlorides Feature Continuous Amount of salt in the wine g/dm³ no
free_sulfur_dioxide Feature Continuous Free form of SO₂; prevents microbial growth and oxidation mg/dm³ no
total_sulfur_dioxide Feature Continuous Total amount of SO₂ (free + bound forms) mg/dm³ no
density Feature Continuous Density of the wine, influenced by alcohol and sugar content g/cm³ no
pH Feature Continuous Measure of acidity or basicity of the wine unitless no
sulphates Feature Continuous Amount of potassium sulphate, contributing to wine stability g/dm³ no
alcohol Feature Continuous Alcohol content of the wine % vol no
color Other Categorical Wine color category red / white no

At the first look at the Wine Quality dataset, it is evident that the data form a small but complete dataset, consisting of approximately 6,500 observations, 12 physicochemical features, and one target variable (quality). No missing values are present across any variables.

The data were collected from expert sensory evaluations of both red and white wines, where each wine sample was assigned a quality score ranging from 0 to 10. The features are entirely numeric and describe measurable chemical properties, making the dataset well-suited for classical machine learning models on structured tabular data.

Table 1: Descriptive statistics of the Wine Quality dataset

Dataset Descriptive statistics

In more detail, the following EDA results that promote a better understanding of the dataset:

Density by component

Correlation

Pairs

Model Architecture

A Decision Tree (CART) architecture was selected as the baseline model due to its strong interpretability and natural fit for small-scale, structured tabular datasets. The dataset contains fewer than 10,000 samples and approximately 10 numeric physicochemical features, for which tree-based splits can directly model non-linear thresholds without requiring feature scaling or complex preprocessing.

The model produces explicit decision rules that enable transparent reasoning about quality predictions and facilitate rapid error analysis. Although single decision trees do not reach state-of-the-art accuracy compared to ensemble methods, they provide fast training, low-latency inference, and clear failure-mode visibility, making them well-suited for baseline modeling and explainability-driven evaluation prior to more complex architectures.

The Decision Tree Model

Let the training data be

\[ \{(x_i, y_i)\}_{i=1}^n, \quad x_i \in \mathbb{R}^p, \quad y_i \in \{1,\dots,K\} \]

The objective is to partition the predictor space into \(M\) disjoint regions

\[ R_1, R_2, \dots, R_M \]

and to assign a class label and corresponding class probabilities to each region.

For an observation \(x \in R_m\), the estimated class probabilities are defined as

\[ \hat{p}_{mk} = \frac{1}{|R_m|} \sum_{x_i \in R_m} \mathbf{1}(y_i = k), \quad k = 1,\dots,K \]

The predicted class is given by

\[ \hat{y}(x) = \arg\max_k \hat{p}_{mk} \]

At each internal node, CART selects the split that optimizes a node impurity criterion. In practice, three criteria are commonly used: Gini impurity, Entropy, and Log Loss. Each criterion measures class heterogeneity within a node and leads to slightly different tree behaviors.

Criterion Mathematical Definition Optimization Objective Characteristics
Gini \(G(R_m) = 1 - \sum_{k=1}^K \hat{p}_{mk}^2\) Minimize weighted impurity Fast, stable, favors dominant classes
Entropy \(H(R_m) = -\sum_{k=1}^K \hat{p}_{mk} \log \hat{p}_{mk}\) Maximize information gain More sensitive to class balance
Log Loss \(L(R_m) = -\frac{1}{R_m}\sum*{i \in R_m} \log \hat{p}*{m,y_i}\) Minimize negative log-likelihood Optimizes probabilistic accurac

For a candidate split into regions \(R_1\) and \(R_2\), the selected criterion \(C(\cdot)\) is minimized according to

\[ \frac{|R_1|}{|R|} C(R_1) + \frac{|R_2|}{|R|} C(R_2) \]

The tree is constructed using a greedy, recursive binary splitting procedure. For a given node \(R\), all possible splits are considered. For each predictor \(x_j\) and split point \(s\), define the regions

\[ R_1(j,s) = \{x : x_j < s\}, \quad R_2(j,s) = \{x : x_j \ge s\} \]

The optimal split \((j^\*, s^\*)\) is chosen to minimize the weighted impurity

\[ (j^\*, s^\*) = \arg\min_{j,s} \left[ \frac{|R_1|}{|R|} G(R_1) + \frac{|R_2|}{|R|} G(R_2) \right] \]

Equivalently, this choice maximizes the impurity reduction

\[ \Delta G = G(R) - \left[ \frac{|R_1|}{|R|} G(R_1) + \frac{|R_2|}{|R|} G(R_2) \right] \]

Once the optimal split is selected, the node \(R\) is partitioned into the two child regions \(R_1(j^\*, s^\*)\) and \(R_2(j^\*, s^\*)\). This procedure is applied recursively to each resulting node until a stopping criterion is met, such as reaching a maximum tree depth, insufficient samples in a node, no further impurity reduction, or a node containing observations from only a single class.

For a terminal region \(R_m\), class probabilities are estimated as

\[ \hat{P}(Y = k \mid X \in R_m) = \frac{1}{|R_m|} \sum_{x_i \in R_m} \mathbf{1}(y_i = k) \]

and the predicted class assigned to the region is

\[ \hat{y}_{R_m} = \arg\max_k \hat{p}_{mk} \]

To control model complexity, cost–complexity pruning may be applied. Let \(T\) denote a subtree with \(|T|\) terminal nodes. The penalized empirical risk is defined as

\[ R_\alpha(T) = \sum_{m=1}^{|T|} \sum_{x_i \in R_m} \mathbf{1}(y_i \ne \hat{y}_{R_m}) + \alpha |T| \]

For a given \(\alpha \ge 0\), the optimal subtree is chosen as

\[ T_\alpha = \arg\min_T R_\alpha(T) \]

Model selection

The CART model is selected based on the following criteria:

Field Value
Model Type Decision Tree (CART)
Role Baseline
Task Multi-class classification
Input Data Small tabular dataset, numeric features
Target Wine quality score

based on the rationales:

  • Dataset size is small, favoring low-variance, interpretable models

  • Features are fully numeric, requiring minimal preprocessing

  • Decision trees provide transparent decision rules and fast iteration

  • Serves as a strong interpretability-focused baseline before ensembles

with the target learning objective

Aspect Description
Split Criterion Gini impurity
Optimization Goal Minimize node impurity

and pre-chosen hyperparameters as follow:

Parameter Value
criterion gini
splitter best
max_depth 30
min_samples_split 5
min_samples_leaf 3
min_weight_fraction_leaf 0.1
max_features 5
max_leaf_nodes 20
min_impurity_decrease 0
class_weight None
random_state 999

The original wine quality score (0–10) is transformed into a three-class classification target—LOW, MEDIUM, and HIGH—to reframe the problem as classification, reduce noise from minor expert score variations, align with common quality thresholds, and improve interpretability and stability for a small dataset.

Quality Score Range Category
≤ 6.0 LOW
(6.0, 8.5] MEDIUM
> 8.5 HIGH

Then the model is trained on the training set, and the performance is evaluated on the validation set. The model is evaluated exclusively using classification accuracy, defined as the proportion of correctly classified samples. Accuracy is used as the sole criterion for model selection, validation, and final evaluation.

Dataset Splitting Strategy

The splitting process is performed in two stages. First, 10% of the full dataset is reserved as a test set to provide an unbiased estimate of generalization performance. The remaining 90% (temporary dataset) is then split into training (80%) and validation (20%) subsets, corresponding to 72% and 18% of the original data, respectively.

# Split
# -----
# Full dataset                   % in total     Description
# ├── Test (10%)                 10%            → final unbiased evaluation (touch once)
# └── Temp (90%)                                → temporal dataset
#       ├── Train (80%)          72%            → model fitting
#       └── Validation (20%)     28%            → tuning / early stopping

Stratified sampling is applied based on the target variable to preserve the original class distribution across all splits.

End-to-End Modeling Flow (Conceptual)

(1) Raw wine samples (red and white) are combined into a single dataset.

(2) The dataset is split into train, validation, and test subsets following the above proportions.

(3) Feature preprocessing is applied consistently across all splits:

  • Categorical features (wine color) are encoded.

  • Numeric physicochemical features are passed through without transformation.

(4) A Decision Tree (CART) classifier is trained using the training set.

(5) Model hyperparameters are evaluated using the validation set.

(6) The final selected model is evaluated once on the held-out test set.

(7) Evaluation

Results

Model Metadata:

Field Value
Model Name classification-model-2026
Model Type Classification
Algorithm Decision Tree (CART)
Owner Thuyet Bao
Alias Baseline
Evaluation Metric Accuracy

Model Performance:

The final model is selected based on validation accuracy and evaluated once on a held-out test set to estimate generalization performance.

Dataset Split Metric Score
Validation Accuracy 79.2%
Test Accuracy 80%

Serving Model

After training, the model is exported to the ONNX format. ONNX (Open Neural Network Exchange) is a standardized, framework-agnostic representation for machine learning models, designed to enable consistent deployment across different environments.

The ONNX has extension .onnx and uses the Protobuf serialization format. An ONNX file encapsulates:

  • The computation graph (nodes and edges)

  • Operators (e.g., MatMul, Relu, Softmax, TreeEnsembleClassifier)

  • Model parameters (weights and tensors)

  • Input and output schemas (names, shapes, data types)

This standardized representation allows models trained in one framework to be executed reliably in different runtimes without retraining or code changes.

Once exported, the ONNX model follows a unified production serving pipeline:

flowchart LR
  train["Training Model"] --> export[Export to ONNX] --> runtime[ONNX Runtime] --> Serving[API / Batch / Edge / Mobile]
  1. Training The model is developed and trained using a supported framework such as PyTorch, TensorFlow, or scikit-learn.

  2. Export to ONNX The trained model is serialized into a .onnx file, preserving its structure, parameters, and input/output definitions.

  3. ONNX Runtime The model is loaded and executed using ONNX Runtime, which provides optimized inference across CPUs, GPUs, and hardware accelerators.

  4. Serving The model is deployed for inference in multiple scenarios, including:

  5. Online APIs for real-time predictions
  6. Batch processing pipelines
  7. Edge or mobile environments

This workflow decouples model training from deployment, enabling scalable and portable production inference.

Implementation

There are 3 sepearted stages for this implementation:

Ind Stage Description
1 Download datasets Download datasets from data registry of UCI
2 Train, export model Training model then export into ONNX format
3 Serving model Serving model through HTTP by ONNX runtime

Stages

Stage 1: Download datasets

The dataset can bedownload by following flow:

(1) Declare the metadata, including: href, folder.

(2) Download the dataset by its zip content. If the .zip file exists, then skip.

(3) Unzip the dataset into the folder.

For the declaration variables:

constant.py
#!/bin/python3

# Global
import os
import sys

# Path Append
sys.path.append(os.path.abspath(os.curdir))

# The experiment name
EXPERIMENT_NAME = "classification-model-2026"

# The container data directory path
DATA_DIR = "/mnt/usr/inference-service/"

# The prefix of project folder
PROJECT_PREFIX_FOLDER = "wine-quality"

# The prefix of project data folder
PROJECT_DATA_FOLDER_PATH = os.path.join(DATA_DIR, PROJECT_PREFIX_FOLDER, "dataset")

# The dictionary of datasets
DATASETS = {
    "wine-quality": {
        "folder": "wine-quality/dataset",
        "href": "https://archive.ics.uci.edu/static/public/186/wine+quality.zip"
    }
}

# The path of dataset
DATASET_COMPONENT = {
    "red": os.path.join(PROJECT_DATA_FOLDER_PATH, "winequality-red.csv"),
    "white": os.path.join(PROJECT_DATA_FOLDER_PATH, "winequality-white.csv"),
}

# The experiment folder
EXPERIMENT_FOLDER = os.path.join(DATA_DIR, PROJECT_PREFIX_FOLDER, EXPERIMENT_NAME)

The following script will be executed:

download.py
#!/bin/python3

# Global
import sys
import os
import zipfile
import argparse
import textwrap

# Path Append
sys.path.append(os.path.abspath(os.curdir))

# External
import httpx
import structlog

# Internal
import workshop.classification_wine_quality.constant as constant

# Set
LOG: structlog.stdlib.BoundLogger = structlog.get_logger()


if __name__ == "__main__":

    parser = argparse.ArgumentParser(
        prog="python workshop/classification_wine_quality/download.py",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        description=textwrap.dedent("""
        [Workshop] Download wine quality datasets

        Usage
        -----

        Normal download
        >>> python workshop/classification_wine_quality/download.py

        Help
        >>> python workshop/classification_wine_quality/download.py --help
        """),
        epilog="Copyright (c) of Thuyet Bao",
    )
    parameters = parser.parse_args()

    # The datasets to download
    TARGET_DATASETS = ["wine-quality"]

    # Load
    for dataset in TARGET_DATASETS:

        # Get
        structlog.contextvars.bind_contextvars(dataset=dataset)
        element = constant.DATASETS[dataset]

        # Build
        dataset_path = os.path.join(constant.DATA_DIR, element["folder"])
        os.makedirs(dataset_path, exist_ok=True)
        LOG.info(f"Prepair the dataset at path={dataset_path}")

        # Download
        LOG.info("Download dataset by zip and unzip content")
        zip_url = element["href"]
        local_zip_path = os.path.join(dataset_path, os.path.basename(zip_url))

        # Validate
        if os.path.exists(local_zip_path):
            LOG.info(f"[CACHED] Zip already exists at path={local_zip_path}")
            pass

        else:

            # Download
            resp = httpx.get(zip_url, follow_redirects=True)
            resp.raise_for_status()
            with open(local_zip_path, "wb") as _file:
                _file.write(resp.content)

        # Extract (overrides if exist)
        with zipfile.ZipFile(local_zip_path, "r") as zip_ref:
            zip_ref.extractall(path=dataset_path)

        LOG.info(f"Successfully downloaded dataset={dataset}")

Stage 2: Training model and export into ONNX format

train.py
#!/bin/python3

# Global
import sys
import os
import argparse
import textwrap
import json
from datetime import datetime
import zoneinfo

# Path Append
sys.path.append(os.path.abspath(os.curdir))

# External
import polars as pl
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import tree
from skl2onnx import to_onnx
from skl2onnx.common.data_types import FloatTensorType, StringTensorType
import structlog

# Internal
import workshop.classification_wine_quality.constant as constant

# Set
LOG: structlog.stdlib.BoundLogger = structlog.get_logger()


def build_dataset(resource: dict[str, str]) -> pl.DataFrame:
    result: pl.DataFrame = None
    for key, path in resource:

        if not os.path.exists(path):
            raise FileNotFoundError(f"File not found: {path}")

        element = pl.read_csv(
            source=path,
            has_header=True,
            separator=";",
            infer_schema_length=10_000,
            schema={
                "fixed acidity": pl.Float64,
                "volatile acidity": pl.Float64,
                "citric acid": pl.Float64,
                "residual sugar": pl.Float64,
                "chlorides": pl.Float64,
                "free sulfur dioxide": pl.Float64,
                "total sulfur dioxide": pl.Float64,
                "density": pl.Float64,
                "pH": pl.Float64,
                "sulphates": pl.Float64,
                "alcohol": pl.Float64,
                "quality": pl.Float64,
            }
        ).select(pl.all().name.map(lambda col: col.lower().replace(" ", "_")))
        element = element.with_columns(pl.lit(key).cast(pl.String).alias("color"))

        if result is None:
            result = element
        else:
            result = pl.concat([result, element], how="diagonal")

    return result


if __name__ == "__main__":

    parser = argparse.ArgumentParser(
        prog="python workshop/classification_wine_quality/train.py",
        formatter_class=argparse.RawDescriptionHelpFormatter,
        description=textwrap.dedent("""
        [Workshop] Train classification model define quality of wine

        Usage
        -----

        Normal train
        >>> python workshop/classification_wine_quality/train.py

        Help
        >>> python workshop/classification_wine_quality/train.py --help
        """),
        epilog="Copyright (c) of Thuyet Bao",
    )
    parameters = parser.parse_args()

    # -------------------------------------------------------------------------
    # Training models ---------------------------------------------------------
    # -------------------------------------------------------------------------

    # Prepair
    LOG.info(f"Experiment name: {constant.EXPERIMENT_NAME}")
    os.makedirs(name=constant.EXPERIMENT_FOLDER, exist_ok=True)

    # Set
    metadata = {
        "experiment": constant.EXPERIMENT_NAME,
        "owner": "thuyetbao",
        "version": "1.2.8",
        "revision": datetime.now(tz=zoneinfo.ZoneInfo("Asia/Ho_Chi_Minh")).strftime("%Y%m%d"),
    }
    experiment_model_onnx_path = os.path.join(constant.EXPERIMENT_FOLDER, "model-latest.onnx")
    experiment_model_metadata_path = os.path.join(constant.EXPERIMENT_FOLDER, "metadata.json")

    # Build
    dataset = build_dataset(resource=constant.DATASET_COMPONENT.items())
    dataset = dataset.with_columns(
        pl.col("color").cast(pl.Categorical).name.keep(),
        pl.col("quality").cut(
            breaks=[6, 8.5],
            labels=["LOW", "MEDIUM", "HIGH"],
            left_closed=False,
            include_breaks=False
        ).alias("category"),
    ).drop(["quality"])

    # Set
    col_factor = "category"
    col_features_numeric = [
        "fixed_acidity",
        "volatile_acidity",
        "citric_acid",
        "residual_sugar",
        "chlorides",
        "free_sulfur_dioxide",
        "total_sulfur_dioxide",
        "density",
        "ph",
        "sulphates",
        "alcohol",
    ]
    col_features_category = [
        "color",
    ]
    col_features = col_features_numeric + col_features_category

    # The frame
    frame = dataset.to_pandas()

    # Split
    # -----
    # Full dataset                   % in total     Description
    # ├── Test (10%)                 10%            → final unbiased evaluation (touch once)
    # └── Temp (90%)                                → temporal dataset
    #       ├── Train (80%)          72%            → model fitting
    #       └── Validation (20%)     28%            → tuning / early stopping
    x_temp, x_test, y_temp, y_test = train_test_split(
        frame.drop("category", axis=1),
        frame["category"],
        test_size=0.1,
        random_state=42,
        stratify=frame["color"]
    )
    x_train, x_valid, y_train, y_valid = train_test_split(
        x_temp,
        y_temp,
        train_size=0.2,
        random_state=42,
        stratify=y_temp
    )

    preprocess = ColumnTransformer(
        transformers=[
            ("cat", OneHotEncoder(handle_unknown="ignore", drop="if_binary"), ["color"]),
            ("num", "passthrough", col_features_numeric),
        ]
    )

    # Build
    model = tree.DecisionTreeClassifier(
        criterion="gini",
        splitter="best",
        max_depth=30,
        min_samples_split=5,
        min_samples_leaf=3,
        min_weight_fraction_leaf=0.1,
        max_features=5,
        random_state=999,
        max_leaf_nodes=20,
        min_impurity_decrease=0,
        class_weight=None,
    )
    pipe = Pipeline(
        steps=[
            ("preprocess", preprocess),
            ("model", model)
        ],
        verbose=True
    )
    metadata = metadata | {"algorithm": "Decision Tree (CART)", "alias": "baseline"}

    # Train
    pipe.fit(x_train, y_train)
    # tree.plot_tree(pipe)

    # Evaluate
    accuracy_valid = accuracy_score(y_valid, pipe.predict(x_valid))
    accuracy_test = accuracy_score(y_test, pipe.predict(x_test))
    LOG.info(f"[Performance] Accuracy: (+) Validation set: {accuracy_valid:4f} (+) Test set: {accuracy_test:4f}")

    # -------------------------------------------------------------------------
    # Export model into ONNX format -------------------------------------------
    # -------------------------------------------------------------------------

    # Build
    initial_types = []

    # Numeric features → float tensor
    for col in col_features_numeric:
        initial_types.append((col, FloatTensorType([None, 1])))

    # Categorical features → string tensor
    for col in col_features_category:
        initial_types.append((col, StringTensorType([None, 1])))

    # Convert into ONNX format.
    onx = to_onnx(
        model=pipe,
        name="classification-wine-quality-2026",
        initial_types=initial_types
    )
    with open(experiment_model_onnx_path, "wb") as file:
        file.write(onx.SerializeToString())

    # Write metadata
    with open(experiment_model_metadata_path, "w") as file:
        json.dump(metadata, file)

After training, the data folder will contain artifacts:

Screenshot artifacts

Stage 3: Serving model through HTTP by ONNX runtime

(1) Build the route /inferences/wine/quality that handle inference request

The inference request payload is contain the properites of wine.

class WineAttributesPayload(BaseModel):
    """Payload of physicochemical tests

   Input variables (based on physicochemical tests):
   1 - fixed acidity
   2 - volatile acidity
   3 - citric acid
   4 - residual sugar
   5 - chlorides
   6 - free sulfur dioxide
   7 - total sulfur dioxide
   8 - density
   9 - pH
   10 - sulphates
   11 - alcohol
   12 - type of wine (red or white)

    Reference
    ---------
    For more information, read [Cortez et al., 2009].
    """
    color: Literal["red", "white"] = Field(default=..., description="Fixed acidity")
    fixed_acidity: float = Field(default=..., description="Fixed acidity")
    volatile_acidity: float = Field(default=..., description="Volatile acidity")
    citric_acid: float = Field(default=..., description="Citric acid")
    residual_sugar: float = Field(default=..., description="Residual sugar")
    chlorides: float = Field(default=..., description="Chlorides")
    free_sulfur_dioxide: float = Field(default=..., description="Free sulfur dioxide")
    total_sulfur_dioxide: float = Field(default=..., description="Total sulfur dioxide")
    density: float = Field(default=..., description="Density")
    ph: float = Field(default=..., description="pH")
    sulphates: float = Field(default=..., description="Sulphates")
    alcohol: float = Field(default=..., description="Alcohol")
    # model_config = ConfigDict(extra="allow")

Then this will be put into the ONNX runtime for inference

class ResponseBaseModel(BaseModel):
    status: str = Field(default=..., description="Status of the request")
    id: str = Field(default_factory=lambda: uuid.uuid4().hex, description="Unique identifier of the request")


class PredictionWineQualityModel(BaseModel):
    type: str = Field(default=..., description="Type of task")
    prediction: str = Field(default=..., description="Quality of wine (Best possible prediction)")
    probabilities: dict[str, float] = Field(default=..., description="Probabilities of prediction per class")

    @field_validator("probabilities", mode="after")
    @classmethod
    def handle_output(cls, value: dict[str, float]):
        ele = {k: round(v, 4) for k, v in value.items()}
        return ele


class ResponsePredictionWineQualityModel(ResponseBaseModel):
    result: PredictionWineQualityModel = Field(default=..., description="Result of the request")

For the route engine:

endpoint/inferences/route.py
#!/bin/python3

# Global

# External
import structlog
from fastapi import (
    APIRouter,
    Depends,
    status,
)
import onnxruntime
import numpy as np

# Internal
import dependencies

# Context
from endpoint.inferences.model import WineAttributesPayload, ResponsePredictionWineQualityModel


router = APIRouter(
    prefix="/inferences/wine",
    tags=["Inference"],
)

# Construct
LOG: structlog.stdlib.BoundLogger = structlog.get_logger()


@router.post(
    path="/quality",
    summary="Inference quality of wine (Classification)",
    description="Inference quality of wine based on the physicochemical properties",
    status_code=status.HTTP_200_OK,
    response_model=ResponsePredictionWineQualityModel,
)
def inferenceClassificationWineQuality(
    payload: WineAttributesPayload,
    model: onnxruntime.InferenceSession = Depends(dependencies.yield_model_classification_wine_quality_latest)
):

    # Echo
    LOG.debug(f"The model payload: {payload.model_dump(mode='python')}")

    # Build
    data = {
        "color": np.array([[payload.color]], dtype=object),
        "fixed_acidity": np.array([[payload.fixed_acidity]], dtype=np.float32),
        "volatile_acidity": np.array([[payload.volatile_acidity]], dtype=np.float32),
        "citric_acid": np.array([[payload.citric_acid]], dtype=np.float32),
        "residual_sugar": np.array([[payload.residual_sugar]], dtype=np.float32),
        "chlorides": np.array([[payload.chlorides]], dtype=np.float32),
        "free_sulfur_dioxide": np.array([[payload.free_sulfur_dioxide]], dtype=np.float32),
        "total_sulfur_dioxide": np.array([[payload.total_sulfur_dioxide]], dtype=np.float32),
        "density": np.array([[payload.density]], dtype=np.float32),
        "ph": np.array([[payload.ph]], dtype=np.float32),
        "sulphates": np.array([[payload.sulphates]], dtype=np.float32),
        "alcohol": np.array([[payload.alcohol]], dtype=np.float32),
    }
    LOG.debug(f"Data: {data}")

    # Predict
    # The output is a list of (label, probabilities)
    # For example:
    # [
    #       array(['MEDIUM'], dtype=object),
    #       [
    #           {
    #               'HIGH': 0.13235294818878174,
    #               'LOW': 0.22794117033481598,
    #               'MEDIUM': 0.6397058963775635
    #           }
    #       ]
    # ]
    output = model.run(output_names=['output_label', 'output_probability'], input_feed=data)
    LOG.debug(f"Output: {output}")

    return {
        "status": "ok",
        "result": {
            "type": "classification",
            "prediction": output[0][0],
            "probabilities": output[1][0]
        }
    }

(2) Then try the inference request

Using the curl command:

curl -X 'POST' \
  'http://127.0.0.1:12123/inferences/wine/quality' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
        "color": "red",
        "fixed_acidity": 0.7,
        "volatile_acidity": 0.0,
        "citric_acid": 1.9,
        "residual_sugar": 0.076,
        "chlorides": 11.0,
        "free_sulfur_dioxide": 34.0,
        "total_sulfur_dioxide": 0.9978,
        "density": 3.51,
        "ph": 0.56,
        "sulphates": 9.4,
        "alcohol": 5.0
    }'
# --- Response body
# {
#   "status": "ok",
#   "id": "0f7e7e34a9564518b447207a81cf4da0",
#   "result": {
#     "type": "classification",
#     "prediction": "LOW",
#     "probabilities": {
#       "HIGH": 0,
#       "LOW": 0.7664,
#       "MEDIUM": 0.2336
#     }
#   }
# }
# --- Response headers
#  access-control-allow-credentials: true
#  access-control-allow-origin: *
#  content-length: 167
#  content-type: application/json
#  x-engine-handle-by: Inference Service
#  x-engine-revision: 20260117
#  x-engine-version: 1.23.41

Citation

Citation for the works of the team that shared the dataset:

@misc{wine_quality_186,
  author       = {Cortez, Paulo, Cerdeira, A., Almeida, F., Matos, T., and Reis, J.},
  title        = {{Wine Quality}},
  year         = {2009},
  howpublished = {UCI Machine Learning Repository},
  note         = {{DOI}: https://doi.org/10.24432/C56S3T}
}

Futher Reading

For some core concepts related to sklearn, look at sklearn Headstart

Reference

Appendix

Appendix 1: Record of Changes

Table: Record of changes

Version Date Author Description
0.4.2 2026/01/18 thuyetbao Metadata, result, artifacts
0.3.15 2026/01/18 thuyetbao Model architecture, EDA and metadata
0.2.9 2026/01/17 thuyetbao Added stages, citation and references
0.1.0 2026/01/17 thuyetbao Initiation documentation