Classification Wine Quality¶
Overview¶
This assignment focuses on implementing an end-to-end classification model for wine quality prediction.
The scope of this work includes both the model development phase and the model deployment perspective, covering:
-
Building, training, and evaluating classification models using the standard scikit-learn workflow
-
Understanding the Decision Tree (CART) algorithm, including how splits are selected, how impurity is minimized, and why trees are interpretable
-
Exporting models to ONNX and serving them in a framework-agnostic production environment
Methodology¶
The assignment is to build a classification model that predicts the quality of wine from the north of Portugal.
| Attribute | Value |
|---|---|
| Model name | classification-model-2026 |
| Version | latest |
| Purpose | Model wine quality based on physicochemical tests |
| Task Type | Classification |
Dataset¶
The dataset named Wine Quality is available from UCI Machine Learning Repository with the schema of dataset:
| Variable Name | Role | Type | Description | Units | Missing Values |
|---|---|---|---|---|---|
| quality | Target | Integer | Sensory quality score assigned by wine experts | score (0–10) | no |
| fixed_acidity | Feature | Continuous | Amount of non-volatile acids in the wine that do not evaporate easily | g/dm³ | no |
| volatile_acidity | Feature | Continuous | Amount of acetic acid in the wine; high values lead to vinegar taste | g/dm³ | no |
| citric_acid | Feature | Continuous | Amount of citric acid, which adds freshness and flavor | g/dm³ | no |
| residual_sugar | Feature | Continuous | Amount of sugar remaining after fermentation stops | g/dm³ | no |
| chlorides | Feature | Continuous | Amount of salt in the wine | g/dm³ | no |
| free_sulfur_dioxide | Feature | Continuous | Free form of SO₂; prevents microbial growth and oxidation | mg/dm³ | no |
| total_sulfur_dioxide | Feature | Continuous | Total amount of SO₂ (free + bound forms) | mg/dm³ | no |
| density | Feature | Continuous | Density of the wine, influenced by alcohol and sugar content | g/cm³ | no |
| pH | Feature | Continuous | Measure of acidity or basicity of the wine | unitless | no |
| sulphates | Feature | Continuous | Amount of potassium sulphate, contributing to wine stability | g/dm³ | no |
| alcohol | Feature | Continuous | Alcohol content of the wine | % vol | no |
| color | Other | Categorical | Wine color category | red / white | no |
At the first look at the Wine Quality dataset, it is evident that the data form a small but complete dataset, consisting of approximately 6,500 observations, 12 physicochemical features, and one target variable (quality). No missing values are present across any variables.
The data were collected from expert sensory evaluations of both red and white wines, where each wine sample was assigned a quality score ranging from 0 to 10. The features are entirely numeric and describe measurable chemical properties, making the dataset well-suited for classical machine learning models on structured tabular data.
Table 1: Descriptive statistics of the Wine Quality dataset
In more detail, the following EDA results that promote a better understanding of the dataset:
Model Architecture¶
A Decision Tree (CART) architecture was selected as the baseline model due to its strong interpretability and natural fit for small-scale, structured tabular datasets. The dataset contains fewer than 10,000 samples and approximately 10 numeric physicochemical features, for which tree-based splits can directly model non-linear thresholds without requiring feature scaling or complex preprocessing.
The model produces explicit decision rules that enable transparent reasoning about quality predictions and facilitate rapid error analysis. Although single decision trees do not reach state-of-the-art accuracy compared to ensemble methods, they provide fast training, low-latency inference, and clear failure-mode visibility, making them well-suited for baseline modeling and explainability-driven evaluation prior to more complex architectures.
The Decision Tree Model¶
Let the training data be
The objective is to partition the predictor space into \(M\) disjoint regions
and to assign a class label and corresponding class probabilities to each region.
For an observation \(x \in R_m\), the estimated class probabilities are defined as
The predicted class is given by
At each internal node, CART selects the split that optimizes a node impurity criterion. In practice, three criteria are commonly used: Gini impurity, Entropy, and Log Loss. Each criterion measures class heterogeneity within a node and leads to slightly different tree behaviors.
| Criterion | Mathematical Definition | Optimization Objective | Characteristics |
|---|---|---|---|
| Gini | \(G(R_m) = 1 - \sum_{k=1}^K \hat{p}_{mk}^2\) | Minimize weighted impurity | Fast, stable, favors dominant classes |
| Entropy | \(H(R_m) = -\sum_{k=1}^K \hat{p}_{mk} \log \hat{p}_{mk}\) | Maximize information gain | More sensitive to class balance |
| Log Loss | \(L(R_m) = -\frac{1}{R_m}\sum*{i \in R_m} \log \hat{p}*{m,y_i}\) | Minimize negative log-likelihood | Optimizes probabilistic accurac |
For a candidate split into regions \(R_1\) and \(R_2\), the selected criterion \(C(\cdot)\) is minimized according to
The tree is constructed using a greedy, recursive binary splitting procedure. For a given node \(R\), all possible splits are considered. For each predictor \(x_j\) and split point \(s\), define the regions
The optimal split \((j^\*, s^\*)\) is chosen to minimize the weighted impurity
Equivalently, this choice maximizes the impurity reduction
Once the optimal split is selected, the node \(R\) is partitioned into the two child regions \(R_1(j^\*, s^\*)\) and \(R_2(j^\*, s^\*)\). This procedure is applied recursively to each resulting node until a stopping criterion is met, such as reaching a maximum tree depth, insufficient samples in a node, no further impurity reduction, or a node containing observations from only a single class.
For a terminal region \(R_m\), class probabilities are estimated as
and the predicted class assigned to the region is
To control model complexity, cost–complexity pruning may be applied. Let \(T\) denote a subtree with \(|T|\) terminal nodes. The penalized empirical risk is defined as
For a given \(\alpha \ge 0\), the optimal subtree is chosen as
Model selection¶
The CART model is selected based on the following criteria:
| Field | Value |
|---|---|
| Model Type | Decision Tree (CART) |
| Role | Baseline |
| Task | Multi-class classification |
| Input Data | Small tabular dataset, numeric features |
| Target | Wine quality score |
based on the rationales:
-
Dataset size is small, favoring low-variance, interpretable models
-
Features are fully numeric, requiring minimal preprocessing
-
Decision trees provide transparent decision rules and fast iteration
-
Serves as a strong interpretability-focused baseline before ensembles
with the target learning objective
| Aspect | Description |
|---|---|
| Split Criterion | Gini impurity |
| Optimization Goal | Minimize node impurity |
and pre-chosen hyperparameters as follow:
| Parameter | Value |
|---|---|
| criterion | gini |
| splitter | best |
| max_depth | 30 |
| min_samples_split | 5 |
| min_samples_leaf | 3 |
| min_weight_fraction_leaf | 0.1 |
| max_features | 5 |
| max_leaf_nodes | 20 |
| min_impurity_decrease | 0 |
| class_weight | None |
| random_state | 999 |
The original wine quality score (0–10) is transformed into a three-class classification target—LOW, MEDIUM, and HIGH—to reframe the problem as classification, reduce noise from minor expert score variations, align with common quality thresholds, and improve interpretability and stability for a small dataset.
| Quality Score Range | Category |
|---|---|
| ≤ 6.0 | LOW |
| (6.0, 8.5] | MEDIUM |
| > 8.5 | HIGH |
Then the model is trained on the training set, and the performance is evaluated on the validation set. The model is evaluated exclusively using classification accuracy, defined as the proportion of correctly classified samples. Accuracy is used as the sole criterion for model selection, validation, and final evaluation.
Dataset Splitting Strategy¶
The splitting process is performed in two stages. First, 10% of the full dataset is reserved as a test set to provide an unbiased estimate of generalization performance. The remaining 90% (temporary dataset) is then split into training (80%) and validation (20%) subsets, corresponding to 72% and 18% of the original data, respectively.
# Split
# -----
# Full dataset % in total Description
# ├── Test (10%) 10% → final unbiased evaluation (touch once)
# └── Temp (90%) → temporal dataset
# ├── Train (80%) 72% → model fitting
# └── Validation (20%) 28% → tuning / early stopping
Stratified sampling is applied based on the target variable to preserve the original class distribution across all splits.
End-to-End Modeling Flow (Conceptual)¶
(1) Raw wine samples (red and white) are combined into a single dataset.
(2) The dataset is split into train, validation, and test subsets following the above proportions.
(3) Feature preprocessing is applied consistently across all splits:
-
Categorical features (wine color) are encoded.
-
Numeric physicochemical features are passed through without transformation.
(4) A Decision Tree (CART) classifier is trained using the training set.
(5) Model hyperparameters are evaluated using the validation set.
(6) The final selected model is evaluated once on the held-out test set.
(7) Evaluation
Results¶
Model Metadata:
| Field | Value |
|---|---|
| Model Name | classification-model-2026 |
| Model Type | Classification |
| Algorithm | Decision Tree (CART) |
| Owner | Thuyet Bao |
| Alias | Baseline |
| Evaluation Metric | Accuracy |
Model Performance:
The final model is selected based on validation accuracy and evaluated once on a held-out test set to estimate generalization performance.
| Dataset Split | Metric | Score |
|---|---|---|
| Validation | Accuracy | 79.2% |
| Test | Accuracy | 80% |
Serving Model¶
After training, the model is exported to the ONNX format. ONNX (Open Neural Network Exchange) is a standardized, framework-agnostic representation for machine learning models, designed to enable consistent deployment across different environments.
The ONNX has extension .onnx and uses the Protobuf serialization format. An ONNX file encapsulates:
-
The computation graph (nodes and edges)
-
Operators (e.g.,
MatMul,Relu,Softmax,TreeEnsembleClassifier) -
Model parameters (weights and tensors)
-
Input and output schemas (names, shapes, data types)
This standardized representation allows models trained in one framework to be executed reliably in different runtimes without retraining or code changes.
Once exported, the ONNX model follows a unified production serving pipeline:
flowchart LR
train["Training Model"] --> export[Export to ONNX] --> runtime[ONNX Runtime] --> Serving[API / Batch / Edge / Mobile] -
Training The model is developed and trained using a supported framework such as PyTorch, TensorFlow, or scikit-learn.
-
Export to ONNX The trained model is serialized into a
.onnxfile, preserving its structure, parameters, and input/output definitions. -
ONNX Runtime The model is loaded and executed using ONNX Runtime, which provides optimized inference across CPUs, GPUs, and hardware accelerators.
-
Serving The model is deployed for inference in multiple scenarios, including:
- Online APIs for real-time predictions
- Batch processing pipelines
- Edge or mobile environments
This workflow decouples model training from deployment, enabling scalable and portable production inference.
Implementation¶
There are 3 sepearted stages for this implementation:
| Ind | Stage | Description |
|---|---|---|
| 1 | Download datasets | Download datasets from data registry of UCI |
| 2 | Train, export model | Training model then export into ONNX format |
| 3 | Serving model | Serving model through HTTP by ONNX runtime |
Stages¶
Stage 1: Download datasets¶
The dataset can bedownload by following flow:
(1) Declare the metadata, including: href, folder.
(2) Download the dataset by its zip content. If the .zip file exists, then skip.
(3) Unzip the dataset into the folder.
For the declaration variables:
#!/bin/python3
# Global
import os
import sys
# Path Append
sys.path.append(os.path.abspath(os.curdir))
# The experiment name
EXPERIMENT_NAME = "classification-model-2026"
# The container data directory path
DATA_DIR = "/mnt/usr/inference-service/"
# The prefix of project folder
PROJECT_PREFIX_FOLDER = "wine-quality"
# The prefix of project data folder
PROJECT_DATA_FOLDER_PATH = os.path.join(DATA_DIR, PROJECT_PREFIX_FOLDER, "dataset")
# The dictionary of datasets
DATASETS = {
"wine-quality": {
"folder": "wine-quality/dataset",
"href": "https://archive.ics.uci.edu/static/public/186/wine+quality.zip"
}
}
# The path of dataset
DATASET_COMPONENT = {
"red": os.path.join(PROJECT_DATA_FOLDER_PATH, "winequality-red.csv"),
"white": os.path.join(PROJECT_DATA_FOLDER_PATH, "winequality-white.csv"),
}
# The experiment folder
EXPERIMENT_FOLDER = os.path.join(DATA_DIR, PROJECT_PREFIX_FOLDER, EXPERIMENT_NAME)
The following script will be executed:
#!/bin/python3
# Global
import sys
import os
import zipfile
import argparse
import textwrap
# Path Append
sys.path.append(os.path.abspath(os.curdir))
# External
import httpx
import structlog
# Internal
import workshop.classification_wine_quality.constant as constant
# Set
LOG: structlog.stdlib.BoundLogger = structlog.get_logger()
if __name__ == "__main__":
parser = argparse.ArgumentParser(
prog="python workshop/classification_wine_quality/download.py",
formatter_class=argparse.RawDescriptionHelpFormatter,
description=textwrap.dedent("""
[Workshop] Download wine quality datasets
Usage
-----
Normal download
>>> python workshop/classification_wine_quality/download.py
Help
>>> python workshop/classification_wine_quality/download.py --help
"""),
epilog="Copyright (c) of Thuyet Bao",
)
parameters = parser.parse_args()
# The datasets to download
TARGET_DATASETS = ["wine-quality"]
# Load
for dataset in TARGET_DATASETS:
# Get
structlog.contextvars.bind_contextvars(dataset=dataset)
element = constant.DATASETS[dataset]
# Build
dataset_path = os.path.join(constant.DATA_DIR, element["folder"])
os.makedirs(dataset_path, exist_ok=True)
LOG.info(f"Prepair the dataset at path={dataset_path}")
# Download
LOG.info("Download dataset by zip and unzip content")
zip_url = element["href"]
local_zip_path = os.path.join(dataset_path, os.path.basename(zip_url))
# Validate
if os.path.exists(local_zip_path):
LOG.info(f"[CACHED] Zip already exists at path={local_zip_path}")
pass
else:
# Download
resp = httpx.get(zip_url, follow_redirects=True)
resp.raise_for_status()
with open(local_zip_path, "wb") as _file:
_file.write(resp.content)
# Extract (overrides if exist)
with zipfile.ZipFile(local_zip_path, "r") as zip_ref:
zip_ref.extractall(path=dataset_path)
LOG.info(f"Successfully downloaded dataset={dataset}")
Stage 2: Training model and export into ONNX format¶
#!/bin/python3
# Global
import sys
import os
import argparse
import textwrap
import json
from datetime import datetime
import zoneinfo
# Path Append
sys.path.append(os.path.abspath(os.curdir))
# External
import polars as pl
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import tree
from skl2onnx import to_onnx
from skl2onnx.common.data_types import FloatTensorType, StringTensorType
import structlog
# Internal
import workshop.classification_wine_quality.constant as constant
# Set
LOG: structlog.stdlib.BoundLogger = structlog.get_logger()
def build_dataset(resource: dict[str, str]) -> pl.DataFrame:
result: pl.DataFrame = None
for key, path in resource:
if not os.path.exists(path):
raise FileNotFoundError(f"File not found: {path}")
element = pl.read_csv(
source=path,
has_header=True,
separator=";",
infer_schema_length=10_000,
schema={
"fixed acidity": pl.Float64,
"volatile acidity": pl.Float64,
"citric acid": pl.Float64,
"residual sugar": pl.Float64,
"chlorides": pl.Float64,
"free sulfur dioxide": pl.Float64,
"total sulfur dioxide": pl.Float64,
"density": pl.Float64,
"pH": pl.Float64,
"sulphates": pl.Float64,
"alcohol": pl.Float64,
"quality": pl.Float64,
}
).select(pl.all().name.map(lambda col: col.lower().replace(" ", "_")))
element = element.with_columns(pl.lit(key).cast(pl.String).alias("color"))
if result is None:
result = element
else:
result = pl.concat([result, element], how="diagonal")
return result
if __name__ == "__main__":
parser = argparse.ArgumentParser(
prog="python workshop/classification_wine_quality/train.py",
formatter_class=argparse.RawDescriptionHelpFormatter,
description=textwrap.dedent("""
[Workshop] Train classification model define quality of wine
Usage
-----
Normal train
>>> python workshop/classification_wine_quality/train.py
Help
>>> python workshop/classification_wine_quality/train.py --help
"""),
epilog="Copyright (c) of Thuyet Bao",
)
parameters = parser.parse_args()
# -------------------------------------------------------------------------
# Training models ---------------------------------------------------------
# -------------------------------------------------------------------------
# Prepair
LOG.info(f"Experiment name: {constant.EXPERIMENT_NAME}")
os.makedirs(name=constant.EXPERIMENT_FOLDER, exist_ok=True)
# Set
metadata = {
"experiment": constant.EXPERIMENT_NAME,
"owner": "thuyetbao",
"version": "1.2.8",
"revision": datetime.now(tz=zoneinfo.ZoneInfo("Asia/Ho_Chi_Minh")).strftime("%Y%m%d"),
}
experiment_model_onnx_path = os.path.join(constant.EXPERIMENT_FOLDER, "model-latest.onnx")
experiment_model_metadata_path = os.path.join(constant.EXPERIMENT_FOLDER, "metadata.json")
# Build
dataset = build_dataset(resource=constant.DATASET_COMPONENT.items())
dataset = dataset.with_columns(
pl.col("color").cast(pl.Categorical).name.keep(),
pl.col("quality").cut(
breaks=[6, 8.5],
labels=["LOW", "MEDIUM", "HIGH"],
left_closed=False,
include_breaks=False
).alias("category"),
).drop(["quality"])
# Set
col_factor = "category"
col_features_numeric = [
"fixed_acidity",
"volatile_acidity",
"citric_acid",
"residual_sugar",
"chlorides",
"free_sulfur_dioxide",
"total_sulfur_dioxide",
"density",
"ph",
"sulphates",
"alcohol",
]
col_features_category = [
"color",
]
col_features = col_features_numeric + col_features_category
# The frame
frame = dataset.to_pandas()
# Split
# -----
# Full dataset % in total Description
# ├── Test (10%) 10% → final unbiased evaluation (touch once)
# └── Temp (90%) → temporal dataset
# ├── Train (80%) 72% → model fitting
# └── Validation (20%) 28% → tuning / early stopping
x_temp, x_test, y_temp, y_test = train_test_split(
frame.drop("category", axis=1),
frame["category"],
test_size=0.1,
random_state=42,
stratify=frame["color"]
)
x_train, x_valid, y_train, y_valid = train_test_split(
x_temp,
y_temp,
train_size=0.2,
random_state=42,
stratify=y_temp
)
preprocess = ColumnTransformer(
transformers=[
("cat", OneHotEncoder(handle_unknown="ignore", drop="if_binary"), ["color"]),
("num", "passthrough", col_features_numeric),
]
)
# Build
model = tree.DecisionTreeClassifier(
criterion="gini",
splitter="best",
max_depth=30,
min_samples_split=5,
min_samples_leaf=3,
min_weight_fraction_leaf=0.1,
max_features=5,
random_state=999,
max_leaf_nodes=20,
min_impurity_decrease=0,
class_weight=None,
)
pipe = Pipeline(
steps=[
("preprocess", preprocess),
("model", model)
],
verbose=True
)
metadata = metadata | {"algorithm": "Decision Tree (CART)", "alias": "baseline"}
# Train
pipe.fit(x_train, y_train)
# tree.plot_tree(pipe)
# Evaluate
accuracy_valid = accuracy_score(y_valid, pipe.predict(x_valid))
accuracy_test = accuracy_score(y_test, pipe.predict(x_test))
LOG.info(f"[Performance] Accuracy: (+) Validation set: {accuracy_valid:4f} (+) Test set: {accuracy_test:4f}")
# -------------------------------------------------------------------------
# Export model into ONNX format -------------------------------------------
# -------------------------------------------------------------------------
# Build
initial_types = []
# Numeric features → float tensor
for col in col_features_numeric:
initial_types.append((col, FloatTensorType([None, 1])))
# Categorical features → string tensor
for col in col_features_category:
initial_types.append((col, StringTensorType([None, 1])))
# Convert into ONNX format.
onx = to_onnx(
model=pipe,
name="classification-wine-quality-2026",
initial_types=initial_types
)
with open(experiment_model_onnx_path, "wb") as file:
file.write(onx.SerializeToString())
# Write metadata
with open(experiment_model_metadata_path, "w") as file:
json.dump(metadata, file)
After training, the data folder will contain artifacts:
Stage 3: Serving model through HTTP by ONNX runtime¶
(1) Build the route /inferences/wine/quality that handle inference request
The inference request payload is contain the properites of wine.
class WineAttributesPayload(BaseModel):
"""Payload of physicochemical tests
Input variables (based on physicochemical tests):
1 - fixed acidity
2 - volatile acidity
3 - citric acid
4 - residual sugar
5 - chlorides
6 - free sulfur dioxide
7 - total sulfur dioxide
8 - density
9 - pH
10 - sulphates
11 - alcohol
12 - type of wine (red or white)
Reference
---------
For more information, read [Cortez et al., 2009].
"""
color: Literal["red", "white"] = Field(default=..., description="Fixed acidity")
fixed_acidity: float = Field(default=..., description="Fixed acidity")
volatile_acidity: float = Field(default=..., description="Volatile acidity")
citric_acid: float = Field(default=..., description="Citric acid")
residual_sugar: float = Field(default=..., description="Residual sugar")
chlorides: float = Field(default=..., description="Chlorides")
free_sulfur_dioxide: float = Field(default=..., description="Free sulfur dioxide")
total_sulfur_dioxide: float = Field(default=..., description="Total sulfur dioxide")
density: float = Field(default=..., description="Density")
ph: float = Field(default=..., description="pH")
sulphates: float = Field(default=..., description="Sulphates")
alcohol: float = Field(default=..., description="Alcohol")
# model_config = ConfigDict(extra="allow")
Then this will be put into the ONNX runtime for inference
class ResponseBaseModel(BaseModel):
status: str = Field(default=..., description="Status of the request")
id: str = Field(default_factory=lambda: uuid.uuid4().hex, description="Unique identifier of the request")
class PredictionWineQualityModel(BaseModel):
type: str = Field(default=..., description="Type of task")
prediction: str = Field(default=..., description="Quality of wine (Best possible prediction)")
probabilities: dict[str, float] = Field(default=..., description="Probabilities of prediction per class")
@field_validator("probabilities", mode="after")
@classmethod
def handle_output(cls, value: dict[str, float]):
ele = {k: round(v, 4) for k, v in value.items()}
return ele
class ResponsePredictionWineQualityModel(ResponseBaseModel):
result: PredictionWineQualityModel = Field(default=..., description="Result of the request")
For the route engine:
#!/bin/python3
# Global
# External
import structlog
from fastapi import (
APIRouter,
Depends,
status,
)
import onnxruntime
import numpy as np
# Internal
import dependencies
# Context
from endpoint.inferences.model import WineAttributesPayload, ResponsePredictionWineQualityModel
router = APIRouter(
prefix="/inferences/wine",
tags=["Inference"],
)
# Construct
LOG: structlog.stdlib.BoundLogger = structlog.get_logger()
@router.post(
path="/quality",
summary="Inference quality of wine (Classification)",
description="Inference quality of wine based on the physicochemical properties",
status_code=status.HTTP_200_OK,
response_model=ResponsePredictionWineQualityModel,
)
def inferenceClassificationWineQuality(
payload: WineAttributesPayload,
model: onnxruntime.InferenceSession = Depends(dependencies.yield_model_classification_wine_quality_latest)
):
# Echo
LOG.debug(f"The model payload: {payload.model_dump(mode='python')}")
# Build
data = {
"color": np.array([[payload.color]], dtype=object),
"fixed_acidity": np.array([[payload.fixed_acidity]], dtype=np.float32),
"volatile_acidity": np.array([[payload.volatile_acidity]], dtype=np.float32),
"citric_acid": np.array([[payload.citric_acid]], dtype=np.float32),
"residual_sugar": np.array([[payload.residual_sugar]], dtype=np.float32),
"chlorides": np.array([[payload.chlorides]], dtype=np.float32),
"free_sulfur_dioxide": np.array([[payload.free_sulfur_dioxide]], dtype=np.float32),
"total_sulfur_dioxide": np.array([[payload.total_sulfur_dioxide]], dtype=np.float32),
"density": np.array([[payload.density]], dtype=np.float32),
"ph": np.array([[payload.ph]], dtype=np.float32),
"sulphates": np.array([[payload.sulphates]], dtype=np.float32),
"alcohol": np.array([[payload.alcohol]], dtype=np.float32),
}
LOG.debug(f"Data: {data}")
# Predict
# The output is a list of (label, probabilities)
# For example:
# [
# array(['MEDIUM'], dtype=object),
# [
# {
# 'HIGH': 0.13235294818878174,
# 'LOW': 0.22794117033481598,
# 'MEDIUM': 0.6397058963775635
# }
# ]
# ]
output = model.run(output_names=['output_label', 'output_probability'], input_feed=data)
LOG.debug(f"Output: {output}")
return {
"status": "ok",
"result": {
"type": "classification",
"prediction": output[0][0],
"probabilities": output[1][0]
}
}
(2) Then try the inference request
Using the curl command:
curl -X 'POST' \
'http://127.0.0.1:12123/inferences/wine/quality' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"color": "red",
"fixed_acidity": 0.7,
"volatile_acidity": 0.0,
"citric_acid": 1.9,
"residual_sugar": 0.076,
"chlorides": 11.0,
"free_sulfur_dioxide": 34.0,
"total_sulfur_dioxide": 0.9978,
"density": 3.51,
"ph": 0.56,
"sulphates": 9.4,
"alcohol": 5.0
}'
# --- Response body
# {
# "status": "ok",
# "id": "0f7e7e34a9564518b447207a81cf4da0",
# "result": {
# "type": "classification",
# "prediction": "LOW",
# "probabilities": {
# "HIGH": 0,
# "LOW": 0.7664,
# "MEDIUM": 0.2336
# }
# }
# }
# --- Response headers
# access-control-allow-credentials: true
# access-control-allow-origin: *
# content-length: 167
# content-type: application/json
# x-engine-handle-by: Inference Service
# x-engine-revision: 20260117
# x-engine-version: 1.23.41
Citation¶
Citation for the works of the team that shared the dataset:
@misc{wine_quality_186,
author = {Cortez, Paulo, Cerdeira, A., Almeida, F., Matos, T., and Reis, J.},
title = {{Wine Quality}},
year = {2009},
howpublished = {UCI Machine Learning Repository},
note = {{DOI}: https://doi.org/10.24432/C56S3T}
}
Futher Reading¶
For some core concepts related to sklearn, look at sklearn Headstart
Reference¶
-
UCI Dataset of Wine Quality. Reference URL
-
https://www.kaggle.com/code/sametkrcan/what-makes-good-wine-correlation-analysis-eda
-
https://www.geeksforgeeks.org/machine-learning/wine-quality-prediction-machine-learning/
-
https://www.kaggle.com/code/harrykesh/statistical-analysis-of-wines-in-progress
Appendix¶
Appendix 1: Record of Changes¶
Table: Record of changes
| Version | Date | Author | Description |
|---|---|---|---|
| 0.4.2 | 2026/01/18 | thuyetbao | Metadata, result, artifacts |
| 0.3.15 | 2026/01/18 | thuyetbao | Model architecture, EDA and metadata |
| 0.2.9 | 2026/01/17 | thuyetbao | Added stages, citation and references |
| 0.1.0 | 2026/01/17 | thuyetbao | Initiation documentation |




