Skip to content

FAQ

Overview

The FAQ section for the database

Question & Explain & Answer

How to write complex and logical SQL statements?

Writing complex and logical SQL statements can be daunting, but with a clear plan, you can ensure that your SQL is both efficient and accurate. The steps below outline the process to write a complex SQL statement.

(a) Set the goal: What is the desired output?

Before you start writing your SQL statement, take a moment to think about what you want the output to look like. What are the columns you want to return? What data do you want to include or exclude? What aggregations do you want to perform?

(b) Check data: Can you collect the required data?

Once you know what you want to return, take a look at the data you have available to you. Do you have all the necessary columns? Are there any additional tables you need to join? Are there any calculations you need to perform?

© Image the output: What does the output look like?

Take a moment to imagine what the output of your SQL statement will look like. Think about the columns, the data, and the order. This will help you to think about the logic of your SQL statement.

(d) Test for accuracy:

Once you have written your SQL statement, make sure to test it for accuracy. Run the statement and check the output against your expected results. If the output is not what you expected, debug your statement and try again.

Trade Off: (a) Time (b) Accuracy:

When writing complex and logical SQL statements, there is a trade-off between time and accuracy. It may take longer to write a more accurate SQL statement, but the payoff will be worth it in the end. On the other hand, if you rush through the process, you may end up with an inaccurate SQL statement that takes less time but will lead to incorrect results.

Question: How to know if the query is correct or not?

Check: (1) Does it meet the goal? (Sense) 34 million (2) Generic Test: (a) Schema of output: (columns, data type)

(b) Query: Check if month is PK or not? (i) For OLTP: Table INFORMATION_SCHEMA (ii) For OLAP: SELECT COUNT(*) = COUNT(DISTINCT month) FROM

(3) Business Test: (a) Logic test based on business (b) Multiple tables, complex © Data domain

Backend yếu quá → We can’t → How to transition

(a) Request to upgrade resources. → Based on Policy, Permissions, Capacibility. (b) Solutions: (i) Sampling: Cluster. https://viblo.asia/p/ky-thuat-chon-mau-trong-audit-jvElaap4lkw (ii) Optimization: (1) Absolute -> Approx: COUNT(*) -> APPROX_COUNT_DISTINCT (15s) (2) Indexing -> Define (3) EXPLAIN -> Optimize query (iii) Database separation: OLTP -> OLAP Engine (1) OLTP -> Output CSV, EXCEL (2) Over RAM datasets tool: duckdb, polars a) Using compressed file format: parquet (OLAP format) b) OLTP (Row format) -> Column Format (PARQUET) c) Polars, duckdb (3) https://duckdb.org/docs/installation/index?version=stable&environment=cli&platform=win&download_method=direct&architecture=x86_64 (4) https://docs.pola.rs/ (iv) Request features from the data platform team.

Question: How to distinguish between OLAP and OLTP

Feature OLAP OLTP
Purpose Data analysis and reporting Transactional operations
Data Size Large, historical data Small, current data
Operations Complex queries Simple queries
Speed Slower due to complex processing Faster for quick transaction
Usage Business Intelligence Day-to-day operations

Question: How to query the INFORMATION_SCHEMA

(a) Defined: System, storage system metadata (b) Structured schema: (1) Backend of the database (Postgres, BigQuery, Snowflake, DuckDB) (2) Version of the backend (i) Based on the documentation © Permission: DBA (Database Administrator) (ii) Smaller with target (Custom)