Master SQL for Data Analytics
In today’s data-driven world, SQL (Structured Query Language) is one of the most essential tools for anyone working with data. Whether you're a business analyst, data scientist, or aspiring data professional, mastering SQL allows you to extract, analyze, and make sense of information hidden in vast databases. But learning SQL effectively requires more than just theory you need real-world practice.
That’s where the Classic Models sample database comes in.
In this blog post, you'll learn how to master SQL for data analytics by working with a realistic, business-focused dataset that mimics the operations of a company selling classic cars. The Classic Models database includes customers, products, orders, payments, and employees making it perfect for practicing complex queries and gaining experience with relational databases.
We'll walk through practical examples that show how to:
- Join tables to combine useful data
- Write analytical queries to calculate revenue, customer value, and product trends
- Group, filter, and sort results to generate business insights
- Answer real questions a company might face using only SQL
By the end of this tutorial, you’ll not only understand how SQL works but you’ll know how to apply it to solve real analytical problems.
So let’s start transforming raw data into meaningful insights with SQL.
Table of Contents
Getting Started with SQL: A Quick Refresher
Before we dive into complex queries and real-world data analysis, it’s important to understand the foundational elements of SQL. Whether you’re brushing up on your skills or just getting started, this quick refresher will help you follow along as we work through the Classic Models sample database.
What is SQL?
SQL (Structured Query Language) is a standardized language used to interact with relational databases. It allows you to retrieve, insert, update, and delete data stored across multiple related tables. SQL is especially powerful in data analytics, where it’s used to:
- Query large datasets efficiently
- Combine information from multiple sources
- Filter and group data for reporting
- Generate insights for business decision-making
In short, if you want to make sense of your data, SQL is your best friend.
Core SQL Commands You Should Know
Here are the most essential SQL commands we’ll be using throughout this tutorial:
- SELECT
Used to retrieve data from one or more tables.
SELECT firstName, lastName
FROM employees;
- WHERE
Filters rows based on a condition.
SELECT *
FROM customers
WHERE country = 'USA';
- ORDER BY
Sorts the result set in ascending or descending order.
SELECT productName, buyPrice
FROM products
ORDER BY buyPrice DESC;
- GROUP BY
Groups rows that have the same values, often used with aggregate functions like SUM()
or COUNT()
.
SELECT customerNumber, COUNT(*) AS totalOrders
FROM orders
GROUP BY customerNumber;
- JOIN
In relational databases, data is often spread across multiple tables. JOINS allow you to combine rows from two or more tables based on a related column usually a key. This is crucial for any kind of real-world analysis. There are several types of JOINS in SQL. Let's break down the most commonly used ones:
INNER JOIN:
- An INNER JOIN returns only the rows that have matching values in both tables.
SELECT orders.orderNumber, customers.customerName
FROM orders
INNER JOIN customers
ON orders.customerNumber = customers.customerNumber;
This will only return orders that have a valid matching customer.
LEFT JOIN
- A LEFT JOIN returns all rows from the left table, and the matched rows from the right table. If there is no match, NULLs will appear for columns from the right table.
SELECT customers.customerName, payments.amount
FROM customers
LEFT JOIN payments
ON customers.customerNumber = payments.customerNumber;
This query includes all customers, even those who haven’t made any payments yet.
RIGHT JOIN
- A RIGHT JOIN is the opposite of a LEFT JOIN. It returns all rows from the right table, and the matched rows from the left table.
SELECT products.productName, orderdetails.quantityOrdered
FROM orderdetails
RIGHT JOIN products
ON orderdetails.productCode = products.productCode;
This ensures that all products are listed, even if they haven’t been ordered.
- LIMIT
Restricts the number of rows returned.
SELECT *
FROM products
LIMIT 10;
Understanding the Classic Models Database: ERD and Table Relationships
An Entity Relationship Diagram (ERD) is a visual representation of a database’s structure. It shows:
- Tables (entities) in the database
- Relationships between them (e.g., one-to-many)
- Primary and foreign keys that connect those tables
This ERD shows how different tables in the Classic Models database are related. For example, the orders
table is connected to the customers
table through the customerNumber
field, meaning each order is placed by a customer. The orderdetails
table links orders
and products
, capturing the specific items in each order. The employees
table is tied to customers
via the salesRepEmployeeNumber
, showing which sales representative handles each account.These connections are made using primary keys (marked with asterisks) and foreign keys, enabling powerful multi-table queries through SQL JOINs.
Setting Up Your SQL Environment with MySQL Workbench
To follow along with the examples in this blog, we'll be using MySQL Workbench, a free and powerful GUI tool for managing MySQL databases. It provides an intuitive interface to run queries, design databases, and visualize table relationships all without needing to use the command line.
If you haven’t installed it yet, you can download it from the official MySQL Workbench page.
Once installed, you’ll need to:
- Set up a new local connection to your MySQL server.
- Import the Classic Models sample database (usually provided as a
.sql
dump). - Use the SQL Editor to start writing and executing queries.
Using SQL for Data Analytics
Calculate the Average Order Amount for Each Country
Understanding the average order amount per country helps businesses identify which regions generate higher-value sales. This can guide decisions in marketing, logistics, and resource allocation. For example, countries with consistently high order values may warrant more personalized service, strategic promotions, or faster shipping options. It’s a foundational query in sales analysis and customer segmentation for data-driven decision-making.
SELECT c.country ,AVG(od.priceEach * od.quantityOrdered) as avg_order_amount
FROM customers as c
INNER JOIN orders as o
ON c.customerNumber = o.customerNumber
INNER JOIN orderdetails as od
ON o.orderNumber = od.orderNumber
GROUP BY c.country;
What This Query Does:
- Joins
customers
,orders
, andorderdetails
to link customers to their orders and the products within those orders. - Calculates the average value of an order by multiplying unit price by quantity.
- Groups the results by
country
to show the average order amount per country. - Orders the results by average order amount in descending order
The query will show you which countries have the highest average order amounts.
country | avg_order_amount |
USA | 3260.239094 |
Japan | 3229.037500 |
Spain | 3214.587982 |
France | 3208.197516 |
Finland | 3208.145109 |
Based on these results, we can see the top 5 countries by average order amount:
- The USA has the highest average order amount at ~$3,260.
- The difference between France and Finland is very small (~$0.05).
- All top 5 countries have average order amounts between 3,200–3,200–3,260.
Calculate the Total Sales for Each Product Line
Knowing the total sales per product line helps businesses identify which categories are generating the most revenue. This kind of analysis supports decisions around inventory planning, product development, and marketing strategies. For example, high-performing product lines may deserve additional promotion, while underperforming ones may need review or discontinuation.
-- Calculate the total sales for each product Line
SELECT p.productLine ,SUM(od.priceEach * od.quantityOrdered) AS total_sales
FROM orderdetails as od
INNER JOIN products as p
ON od.productCode = p.productCode
GROUP BY p.productLine
ORDER BY total_sales DESC ;
What This Query Does:
- Joins Two Tables:
orderdetails
(contains individual items sold in each order)products
(contains product information, including categorization byproductLine
)
- Calculates Revenue per Item:
- For each item in
orderdetails
, it calculatespriceEach * quantityOrdered
(revenue for that line item)
- For each item in
- Groups Results by Product Line:
- Sums up the revenue for all items belonging to the same
productLine
(e.g., “Classic Cars”, “Motorcycles”)
- Sums up the revenue for all items belonging to the same
- Sorts by Highest-Selling:
- Orders the results by
total_sales
in descending order (highest revenue first)
- Orders the results by
The query will show you which product Line have the highest total sales amounts.
productLine | total_sales |
Classic Cars | 3853922.49 |
Vintage Cars | 1797559.63 |
Motorcycles | 1121426.12 |
Trucks and Buses | 1024113.57 |
Planes | 954637.54 |
Classic Cars account for $3.85M more than double the next category. This suggests:
- Your customer base strongly prefers classic automobiles
- Potential to expand this line with complementary products (parts, memorabilia)
- Consider premium pricing strategies for this high-demand category
Vintage Cars (1.8M) and Motorcycles (1.12M) form a healthy middle tier:
- Vintage Cars likely appeal to collectors (higher value, lower volume)
- Motorcycles represent a distinct but substantial customer segment
Trucks/Buses (commercial) generate comparable revenue to Planes (hobbyist):
- Trucks may have steady B2B demand (fleet purchases)
- Planes' lower sales could indicate:
- Niche audience
- Higher price barriers
- Less effective marketing to aviation enthusiasts
Top 10 Best-Selling Products by Quantity Sold
Identifying the top 10 best-selling products based on quantity sold is crucial for understanding which items are driving the most volume. This metric is often used to shape inventory decisions, promotional strategies, and product forecasting.
High-volume products may need larger stock reserves, better supplier coordination, or featured placement in marketing campaigns. Analyzing sales by quantity not just revenue also helps uncover products that are popular with customers, even if their unit price is low.
-- top 10 best selling products based on total quantity sold
select p.productName, SUM(od.quantityOrdered) as quantity_ordred
FROM orderdetails AS od
INNER JOIN products p
ON od.productCode = p.productCode
GROUP BY p.productName
order by quantity_ordred DESC
LIMIT 10;
What This Query Does:
This SQL query identifies the top 10 best-selling products based on the total quantity sold across all orders.
- Joins Two Tables:
orderdetails
(contains records of products ordered, including quantities)products
(contains product names and details)
- Calculates Total Quantity Sold:
- For each product, it sums up all quantities ordered (
SUM(od.quantityOrdered)
).
- For each product, it sums up all quantities ordered (
- Groups by Product Name:
- Aggregates results by
productName
to calculate total sales volume per product.
- Aggregates results by
- Sorts and Limits Results:
- Orders products by total quantity sold in descending order (highest to lowest).
- Returns only the top 10 products using
LIMIT 10
.
The query will show you which product have the highest total quantity sold.
productName | quantity_ordred |
1992 Ferrari 360 Spider red | 1808 |
1937 Lincoln Berline | 1111 |
American Airlines: MD-11S | 1085 |
1941 Chevrolet Special Deluxe Cabriolet | 1076 |
1930 Buick Marquette Phaeton | 1074 |
1940s Ford truck | 1061 |
1969 Harley Davidson Ultimate Chopper | 1057 |
1957 Chevy Pickup | 1056 |
1964 Mercedes Tour Bus | 1053 |
1956 Porsche 356A Coupe | 1052 |
The 1992 Ferrari 360 Spider (red) is the clear leader with 1,808 units sold, significantly outperforming others which suggests strong customer preference possibly due to popularity, pricing, or marketing while the next 9 products have sales between 1,052–1,111 units, indicating a tight race.
Evaluate Sales Performance of Each Sales Representative
Tracking the sales performance of individual sales representatives helps organizations measure productivity, set realistic targets, and reward top performers. By evaluating total sales attributed to each rep, managers can spot high achievers, identify who may need additional support, and make informed decisions on sales training, incentives, or territory assignments.
-- Evaluate the sales performance of each sale representative
SELECT emp.employeeNumber, emp.lastName, emp.firstName, COUNT(o.orderNumber) AS num_orders,
SUM(od.quantityOrdered * od.priceEach) AS sales_amount, emp.jobTitle
FROM classicmodels.employees emp
INNER JOIN classicmodels.customers c
ON emp.employeeNumber = c.salesRepEmployeeNumber
INNER JOIN orders o
ON c.customerNumber = o.customerNumber
INNER JOIN orderdetails od
ON o.orderNumber = od.orderNumber
GROUP BY emp.employeeNumber
ORDER BY sales_amount DESC;
What This Query Does:
This query evaluates the sales performance of each sales representative by:
- Joining employees → customers → orders → orderdetails to track sales reps to their orders.
- Grouping the results by employeeNumber to aggregate their performance.
- Orders results by sales_amount (highest to lowest) to rank top performers.
employeeNumber | lastName | firstName | num_orders | sales_amount | jobTitle |
1370 | Hernandez | Gerard | 396 | 1258577.81 | Sales Rep |
1165 | Jennings | Leslie | 331 | 1081530.54 | Sales Rep |
1401 | Castillo | Pamela | 272 | 868220.55 | Sales Rep |
The query results shows that we have:
Top Performer: Gerard Hernandez (Employee #1370)
- Total Orders Handled: 396 (highest volume)
- Revenue Generated: $1.26M (highest sales)
- Implication:
- Most successful rep in both order volume and revenue.
- Likely excels in client retention, upselling, or high-value deals.
Strong Contender: Leslie Jennings (Employee #1165)
- Total Orders Handled: 331 (20% fewer than Gerard)
- Revenue Generated: $1.08M (14% less than Gerard)
- Implication:
- Consistent performance but slightly lower efficiency.
- May benefit from Gerard’s strategies to close more high-value deals.
Pamela Castillo (Employee #1401)
- Total Orders Handled: 272 (lowest in this group)
- Revenue Generated: $868K (31% less than Gerard)
- Implication:
- Handles fewer orders but still significant revenue.
- Might focus on higher-margin products or larger clients.
Calculate the Percentage of Orders Shipped On Time
Timely delivery is a critical factor in customer satisfaction and operational efficiency. By calculating the percentage of orders shipped on or before the required date, businesses can evaluate how well their logistics and fulfillment processes are performing.
This metric helps identify delays, assess supplier reliability, and improve internal workflows. It’s especially valuable for companies aiming to reduce churn and increase repeat purchases by meeting customer expectations consistently.
-- Calculate the percentage of orders that were shipped on time
SELECT 100*(SUM(CASE WHEN requiredDate >= shippedDate THEN 1
ELSE 0
END )/COUNT(orderNumber)) AS shipped_on_time_percent
FROM orders;
--Output: 95.3988
What This Query Does:
This SQL query calculates the percentage of orders that were shipped on time by comparing the actual shipment date (shippedDate
) with the promised delivery deadline (requiredDate
). If an order's shipment date is on or before its required date, it counts as an on-time shipment (assigned a value of 1), otherwise it's considered late (assigned a value of 0).
The query then sums all the on-time shipments, divides this number by the total count of orders, and multiplies by 100 to convert the result into a percentage.
The query result shows that 95.4% of all orders were shipped on or before their required date. This indicates strong logistics/supply chain performance, as the vast majority of orders meet deadlines.
Calculate Net Profit per Product
Understanding the net profit per product is essential for making informed decisions about pricing, inventory, and product strategy. While total sales show revenue, net profit reveals true profitability by accounting for the cost of goods sold (COGS).
This analysis helps identify high-margin products worth promoting and low-profit items that may need re-evaluation or cost reduction. It’s a core metric for optimizing product portfolios and maximizing business profitability.
-- Net profit per product
SELECT productName, SUM(Net_profit) AS Net_profit
FROM
(SELECT p.productName, ((od.quantityOrdered * od.priceEach) - (od.quantityOrdered * p.buyPrice)) AS Net_profit
FROM products p
LEFT JOIN orderdetails od
ON p.productCode = od.productCode
) t1
GROUP BY productName
ORDER BY Net_profit DESC;
What This Query Does:
This SQL query calculates the net profit generated by each product in the database, sorted from highest to lowest profit. Here's how it works in detail:
The query first creates a derived table (subquery aliased as t1) that computes the profit for every individual order line item. For each product sold, it calculates:
- Gross Revenue: (quantityOrdered × priceEach) – what customers paid
- Cost of Goods Sold: (quantityOrdered × buyPrice) – what the product cost the company
- Net Profit: Revenue minus cost (gross profit per item)
The outer query then:
- Groups these calculations by product name
- Sums up all profit amounts for each product across all orders
- Sorts the results to show the most profitable products first
The LEFT JOIN ensures all products are included in the results, even those that haven't been ordered yet (which would show NULL or zero profit). The descending sort immediately highlights your most valuable products at the top of the list.
productName | Net_profit |
1992 Ferrari 360 Spider red | 135996.78 |
1952 Alpine Renault 1300 | 95282.58 |
2001 Ferrari Enzo | 93349.65 |
The output shows the three most profitable products in your database, revealing important insights about your product performance.
Identify Frequently Co-Purchased Products
Identifying frequently co-purchased products helps businesses understand customer buying behavior and discover valuable product pairings. This insight is critical for creating effective cross-sell and upsell strategies, optimizing product placements, and designing bundle deals.
By analyzing orders that contain multiple items and tracking which products are commonly bought together, businesses can boost average order value and improve the overall shopping experience.
-- Identify frequently co-purchased products
select p.productCode, p.productName, p1.productCode, p1.productName, COUNT(*) AS pair_purchase
from orderdetails od
INNER JOIN orderdetails od1
ON od.orderNumber = od1.orderNumber AND od.productCode <> od1.productCode
INNER JOIN products p
ON od1.productCode = p.productCode
INNER JOIN products p1
ON od.productCode = p1.productCode
GROUP BY od.productCode, p.productName, od1.productCode, p1.productName
ORDER BY pair_purchase DESC
What This Query Does:
- Self-Join on
orderdetails
Table:- This is the key logic: it matches each product in an order (
od
) with other products in the same order (od1
), excluding cases where it's the same product (od.productCode <> od1.productCode
). - This way, you're pairing products that were bought together in the same order.
- This is the key logic: it matches each product in an order (
- Join with the
products
Table Twice- This adds product names (
productName
) for both products in the pair, so the output is human-readable.
- This adds product names (
- Group and CountGroup and Count
- Groups the results by each unique product pair.
- This prevents duplicates and allows you to count how often each pair appears.
- Count Co-Purchases and Sort
- Counts how many times each pair of products was purchased together.
- Sorts the result in descending order, showing the most frequently co-purchased pairs at the top.
productCode | productName | productCode | productName | pair_purchase |
S50_1341 | 1930 Buick Marquette Phaeton | S700_1691 | American Airlines: B767-300 | 28 |
S72_1253 | HMS Bounty | S700_2047 | Boeing X-32A JSF | 27 |
S24_3949 | American Airlines: MD-11S | S700_4002 | Corsair F4U ( Bird Cage) | 27 |
Conclusion:
In this guide, we took a hands-on approach to mastering SQL for data analytics using the Classic Models sample database. We began by understanding the structure of the database through its ERD, then gradually built practical queries that reflect real-world business questions.
From calculating average order values by country to evaluating sales performance and identifying co-purchased products, we demonstrated how SQL can unlock valuable insights hidden in relational data. Along the way, you practiced essential SQL skills like joins, aggregations, grouping, and filtering while applying them in meaningful analytics scenarios.
By combining database knowledge with analytical thinking, you now have a foundation for solving real business problems using SQL.
While SQL is great for querying and extracting insights from databases, Python is an excellent next step for advanced data processing and visualization. Tools like Pandas, Matplotlib, and Seaborn can help you clean, transform, and visually explore your SQL data in powerful ways.
Whether you’re just getting started or leveling up your data career, this type of applied practice is key to becoming a confident and effective data analyst.
This next section may contain affiliate links. If you click one of these links and make a purchase, I may earn a small commission at no extra cost to you. Thank you for supporting the blog!
References
Learning SQL: Generate, Manipulate, and Retrieve Data
Python Programming and SQL: 5 books in 1
SQL for Data Analysis: Advanced Techniques for Transforming Data into Insights
Frequently Asked Questions (FAQs)
What is the Classic Models sample database?
The Classic Models database is a sample dataset provided by MySQL that represents a fictional company selling classic cars. It includes tables for customers, orders, products, employees, and payments making it perfect for learning relational database concepts and SQL queries.
Can I use these SQL queries on another database?
Most of the queries will work on other relational databases like PostgreSQL or SQL Server, though you might need to adjust syntax slightly depending on the system.
Why is SQL important for data analytics?
SQL is the standard language for querying and manipulating structured data. It allows analysts to extract insights, clean data, and answer business questions directly from relational databases.
What’s the difference between INNER JOIN, LEFT JOIN, and RIGHT JOIN?
–INNER JOIN
returns only matching records from both tables.
–LEFT JOIN
returns all records from the left table, and matching ones from the right.
–RIGHT JOIN
does the reverse: all from the right, and matching ones from the left.
How can I avoid duplicate results in SQL queries?
You can use the DISTINCT
keyword to eliminate duplicate rows, and make sure your JOIN
conditions are correct. Poorly defined joins are a common cause of unexpected duplicates.
Do I need MySQL Workbench to follow along?
Yes, this guide uses MySQL Workbench to run and test SQL queries. However, you can also use other tools like DBeaver, phpMyAdmin, or even command-line MySQL if you're comfortable with them.
No Comment! Be the first one.