MySQL Practice Scenarios: Subquery Questions

PROGRAMMING

3/23/202413 min read

MacBook Pro with images of computer language codes
MacBook Pro with images of computer language codes

In this first scenario, we have a sample dataset table called "employees" which contains information about employees in a company. The table has columns for employee_id, employee_name, department_id, and salary. The goal is to write a query that retrieves the names of all employees who earn a salary higher than the average salary in their department.

To solve this scenario, we can use a subquery to calculate the average salary for each department, and then compare it with the salary of each employee. If the employee's salary is higher than the average, we include their name in the result. Here's how the query would look like:

```sql SELECT employee_name FROM employees WHERE salary > (SELECT AVG(salary) FROM employees GROUP BY department_id) ```

This query first calculates the average salary for each department using the subquery `(SELECT AVG(salary) FROM employees GROUP BY department_id)`. Then, it compares the salary of each employee with the average salary of their department using the condition `salary >`. Finally, it retrieves the names of the employees who meet this condition.

By using subqueries, we can easily perform complex calculations and comparisons within a single query. This allows us to retrieve specific information from our dataset based on various conditions and criteria. Subqueries are a powerful tool in MySQL, and mastering them can greatly enhance your SQL skills.

Now that we have solved the first scenario, let's move on to the next one.

Scenario 1: Highest Sales

Situation: You have a table called "sales" that contains information about sales made by different salespersons. Each row represents a sale and includes the salesperson's name, the product name, and the sale amount. You want to find the salesperson with the highest total sales.

Sample Dataset:

Name Product Amount
John Product A 100
John Product B 200
Emily Product A 150
Emily Product C 300
Michael Product B 250

Your Query:

SELECT Name, SUM(Amount) AS TotalSales
FROM sales
GROUP BY Name
ORDER BY TotalSales DESC
LIMIT 1;

Solution: The query above selects the name and the sum of the amount for each salesperson from the "sales" table. It groups the results by name and orders them in descending order based on the total sales. Finally, it limits the result to only the first row, which represents the salesperson with the highest total sales.

In this scenario, we are trying to identify the salesperson who has achieved the highest total sales. By using the provided query, we can obtain the desired result. The query starts by selecting the "Name" column and the sum of the "Amount" column from the "sales" table. The "SUM(Amount) AS TotalSales" part of the query calculates the total sales for each salesperson and assigns it the alias "TotalSales".

Next, the query groups the results by the salesperson's name using the "GROUP BY Name" clause. This ensures that the total sales are calculated for each individual salesperson separately.

After the grouping, the query orders the results in descending order based on the total sales using the "ORDER BY TotalSales DESC" clause. This means that the salesperson with the highest total sales will appear first in the result set.

Finally, the query limits the result to only the first row using the "LIMIT 1" clause. This ensures that we only get the salesperson with the highest total sales in the final result.

By executing this query on the provided sample dataset, we can determine that the salesperson with the highest total sales is Emily, with a total sales amount of 450.

Scenario 2: Customers with No Orders

Situation: You have two tables, "customers" and "orders". The "customers" table contains information about customers, while the "orders" table contains information about orders placed by those customers. You want to find the customers who have not placed any orders.

Sample Dataset:

CustomerID Name
1 John
2 Emily
3 Michael
OrderID CustomerID Product
1 1 Product A
2 1 Product B
3 3 Product C

Your Query:

SELECT Name
FROM customers
WHERE CustomerID NOT IN (SELECT CustomerID FROM orders);

Solution: The query above selects the names from the "customers" table where the customer ID is not present in the subquery, which selects the customer IDs from the "orders" table. This gives us the customers who have not placed any orders.

By using this query, you can easily identify the customers who have not placed any orders. In the given sample dataset, the customers "Emily" and "Michael" have not placed any orders. This query can be useful in various scenarios, such as identifying inactive customers or targeting specific marketing campaigns towards customers who have not made any purchases.

In addition to finding customers with no orders, you can further enhance this query by including additional conditions or joining other tables to gather more information about these customers. For example, you can join the "customers" table with a "locations" table to find out the geographical distribution of customers who have not placed any orders.

Furthermore, you can use this query as a basis for further analysis. For instance, you can modify the query to count the number of orders placed by each customer and identify customers with a low number of orders. This can help in identifying potential loyal customers or customers who might need additional incentives to make more purchases.

Overall, the query provided is a powerful tool for analyzing customer data and identifying customers who have not placed any orders. By leveraging SQL and database management systems, businesses can gain valuable insights into customer behavior and make data-driven decisions to improve their operations and customer satisfaction.

To further analyze the average order amount, you can combine this query with additional filters or groupings. For example, you can calculate the average order amount for each customer or for a specific time period. Let's say you want to find the average order amount for each customer. You can modify the query as follows:
SELECT CustomerID, AVG(Amount) AS AverageOrderAmount
FROM orders
GROUP BY CustomerID;
This query adds the "CustomerID" column to the select statement and includes the "GROUP BY" clause to group the results by customer. By doing so, you can obtain the average order amount for each customer separately. The result will include multiple rows, with each row representing a different customer and their corresponding average order amount. Alternatively, if you want to calculate the average order amount for a specific time period, you can introduce a date column in the "orders" table and modify the query accordingly. Let's assume the table now includes a "OrderDate" column. You can then find the average order amount for each month using the following query:
SELECT YEAR(OrderDate) AS Year, MONTH(OrderDate) AS Month, AVG(Amount) AS AverageOrderAmount
FROM orders
GROUP BY YEAR(OrderDate), MONTH(OrderDate);
In this query, the "YEAR()" and "MONTH()" functions are used to extract the year and month from the "OrderDate" column. By grouping the results based on the year and month, you can calculate the average order amount for each month separately. The result will include multiple rows, with each row representing a different month and its corresponding average order amount. By customizing the query based on your specific requirements, you can gain deeper insights into the average order amount and analyze trends or patterns in customer behavior over time.

Scenario 4: Products Sold by All Salespersons

Situation: You have two tables, "salespersons" and "sales". The "salespersons" table contains information about salespersons, while the "sales" table contains information about sales made by those salespersons. You want to find the products that have been sold by all salespersons.

Sample Dataset:

SalespersonID Name
1 John
2 Emily
3 Michael
SalespersonID Product
1 Product A
1 Product B
2 Product A
3 Product C

Your Query:

SELECT Product
FROM sales
GROUP BY Product
HAVING COUNT(DISTINCT SalespersonID) = (SELECT COUNT(*) FROM salespersons);

Solution: The query above selects the products from the "sales" table, groups them by product, and filters the result using the HAVING clause. The HAVING clause ensures that the count of distinct salesperson IDs for each product is equal to the count of all salespersons from the "salespersons" table, which means the product has been sold by all salespersons.

Explanation: The query is designed to return only the products that have been sold by all salespersons. To achieve this, the query uses the COUNT function to count the number of distinct salesperson IDs for each product. The HAVING clause then compares this count to the total number of salespersons in the "salespersons" table, which is obtained using a subquery. If the count of distinct salesperson IDs for a product is equal to the total number of salespersons, it means that the product has been sold by all salespersons. In this case, the product is included in the result set. For example, in the given sample dataset, there are three salespersons (John, Emily, and Michael) and four products (Product A, Product B, Product C). The query will return Product A because it has been sold by both John and Emily, and Product C because it has been sold by Michael. Product B will not be included in the result because it has only been sold by John. By using this query, you can easily identify the products that have been sold by all salespersons in your dataset, allowing you to analyze sales trends and performance across your sales team.

Scenario 5: Customers with Multiple Orders

Situation: You have two tables, "customers" and "orders". The "customers" table contains information about customers, while the "orders" table contains information about orders placed by those customers. You want to find the customers who have placed more than one order.

Sample Dataset:

CustomerID Name
1 John
2 Emily
3 Michael
OrderID CustomerID Product
1 1 Product A
2 1 Product B
3 3 Product C

Your Query:

SELECT Name
FROM customers
WHERE CustomerID IN (SELECT CustomerID FROM orders GROUP BY CustomerID HAVING COUNT(*) > 1);

Solution: The query above selects the names from the "customers" table where the customer ID is present in the subquery. The subquery selects the customer IDs from the "orders" table, groups them by customer ID, and filters the result using the HAVING clause to only include customers with more than one order.

Explanation: By using the subquery, we are able to retrieve the customer IDs from the "orders" table that have more than one entry. The GROUP BY clause groups the orders by customer ID, and the HAVING clause filters the result to only include customer IDs with a count greater than 1. This subquery is then used as a condition in the WHERE clause of the main query to select the corresponding customer names from the "customers" table. This allows us to identify the customers who have placed more than one order.

For example, in the given sample dataset, the customer with ID 1 has placed two orders, while the customer with ID 3 has placed only one order. Therefore, the query will return the name "John" as the result, as John is the only customer who has placed more than one order.

The total sales by month scenario is a common requirement in data analysis. It allows businesses to gain insights into their sales performance over time and identify trends or patterns. In this scenario, we have a table called "sales" that contains information about sales made by different salespersons. Each row represents a sale and includes the salesperson's name, the product name, the sale amount, and the sale date. To calculate the total sales amount for each month, we can use the SQL query provided. The query begins by selecting the formatted date as "Month" and the sum of the amount for each month from the "sales" table. It uses the DATE_FORMAT() function to format the date as "YYYY-MM", which is the standard format for representing dates in SQL. This formatting ensures that the dates are grouped accurately. Next, the query uses the GROUP BY clause to group the results by the formatted date. This allows us to aggregate the sales data for each month separately. Finally, the query calculates the sum of the sales amount for each month using the SUM() function and aliases it as "TotalSales". The result of this query will be a table that shows the total sales amount for each month. For example, using the sample dataset provided, the query will return the following result:
Month TotalSales
2021-01 300
2021-02 450
2021-03 250
This result indicates that in January 2021, the total sales amount was 300. In February 2021, the total sales amount increased to 450, and in March 2021, it decreased to 250. By analyzing this data, businesses can identify which months had the highest or lowest sales and make informed decisions based on these insights. In conclusion, the provided SQL query is an effective way to calculate the total sales amount for each month from a sales table. By using the DATE_FORMAT() function and the GROUP BY clause, businesses can easily analyze their sales performance over time and gain valuable insights into their operations. To further enhance this scenario, let's consider a real-life example. Imagine you are working for an e-commerce company that sells various products online. As part of your job, you need to analyze customer data to identify the customers who have made the highest order amounts. This information is crucial for the company as it allows them to understand their top-spending customers and tailor their marketing strategies accordingly. In the given scenario, you have two tables: "customers" and "orders". The "customers" table contains information about the customers, such as their unique customer IDs and names. On the other hand, the "orders" table holds details about the orders placed by the customers, including the order IDs, customer IDs, and the corresponding order amounts. To find the customers with the highest order amounts, you can utilize SQL to query the data. The provided query demonstrates how you can achieve this. By joining the "customers" and "orders" tables using the customer ID as the common field, you can retrieve the customer's name and the maximum order amount they have made. The "MAX(Amount)" function helps you identify the highest order amount, while the "AS HighestOrderAmount" statement assigns a label to this value for better readability. Once you execute the query, you will obtain the desired result. The output will consist of the customer's name and their highest order amount. This information can then be used for further analysis or to create reports that highlight the top-spending customers. By understanding the preferences and buying behavior of these customers, the company can develop personalized marketing campaigns, loyalty programs, or exclusive offers to maximize customer satisfaction and drive sales. In conclusion, by leveraging the power of SQL and combining data from the "customers" and "orders" tables, you can easily identify the customers who have made the highest order amounts. This information is invaluable for businesses as it allows them to focus their efforts on retaining and nurturing their most valuable customers, ultimately leading to increased revenue and business growth. To further analyze the scenario, let's consider a hypothetical situation where the company needs to identify the salespersons who have not made any sales in the past quarter. This information is crucial for the management team to assess the performance of their sales force and take appropriate actions to improve sales productivity. The "salespersons" table, as mentioned earlier, contains relevant information about each salesperson, including their unique SalespersonID and Name. On the other hand, the "sales" table holds data about the sales made by these salespersons, including the SalespersonID and the specific Product sold. To identify the salespersons with no sales, we can execute the provided SQL query. The query makes use of a subquery to find the distinct SalespersonIDs from the "sales" table. By using the NOT IN operator, the main query retrieves the names from the "salespersons" table that do not have a corresponding SalespersonID in the subquery. This approach allows us to filter out the salespersons who have made at least one sale, providing us with a list of salespersons who have not generated any sales during the specified period. The management team can then focus on these individuals to understand the underlying reasons for their lack of sales and provide appropriate support or training to improve their performance. By regularly running this query and tracking the salespersons who consistently have no sales, the company can identify patterns and trends that may indicate larger issues within the sales team or the overall sales strategy. This data-driven approach enables the management team to make informed decisions and take proactive measures to optimize sales operations. In conclusion, the provided SQL query offers a valuable solution to identify salespersons with no sales. By leveraging the power of subqueries and logical operators, companies can efficiently analyze their sales data and gain insights into the performance of their sales force. This information can then be used to drive improvements and enhance overall sales productivity.

Scenario 9: Customers with Same Order Amount

Situation: You have two tables, "customers" and "orders". The "customers" table contains information about customers, while the "orders" table contains information about orders placed by those customers. You want to find the customers who have placed orders with the same order amount.

Sample Dataset:

CustomerID Name
1 John
2 Emily
3 Michael
OrderID CustomerID Amount
1 1 100
2 2 200
3 1 100
4 3 300
5 2 200

Your Query:

SELECT Name, Amount
FROM customers
JOIN orders ON customers.CustomerID = orders.CustomerID
GROUP BY Name, Amount
HAVING COUNT(*) > 1;

Solution: The query above selects the name and the order amount from the "customers" table. It joins the "customers" and "orders" tables using the customer ID, groups the results by name and amount, and filters the result using the HAVING clause to only include customers who have placed orders with the same order amount.

For example, looking at the sample dataset, we can see that John and Emily have both placed orders with an amount of 100. Therefore, they would be included in the result of the query. Similarly, Emily and Michael have both placed orders with an amount of 200, so they would also be included in the result.

This query can be useful in scenarios where you want to identify customers who have made multiple orders with the same order amount. It can provide insights into customer behavior and preferences, allowing you to tailor your marketing strategies or product offerings accordingly. Additionally, it can help in identifying any discrepancies or errors in the order amount, as customers with the same order amount may indicate a potential issue in the system or data entry process.

By leveraging the power of SQL and utilizing queries like the one above, businesses can gain valuable insights from their data and make data-driven decisions to drive growth and improve customer satisfaction.

Scenario 10: Total Sales by Product

Situation: You have a table called "sales" that contains information about sales made by different salespersons. Each row represents a sale and includes the salesperson's name, the product name, and the sale amount. You want to find the total sales amount for each product.

Sample Dataset:

Name Product Amount
John Product A 100
John Product B 200
Emily Product A 150
Emily Product C 300
Michael Product B 250

Your Query:

SELECT Product, SUM(Amount) AS TotalSales
FROM sales
GROUP BY Product;

Solution: The query above selects the product and the sum of the amount for each product from the "sales" table. It groups the results by product, giving us the total sales amount for each product.

For example, if we execute the query on the provided sample dataset, the result would be:

Product TotalSales
Product A 250
Product B 450
Product C 300

This result shows that the total sales for "Product A" is 250, for "Product B" is 450, and for "Product C" is 300.

By using the GROUP BY clause in the query, we can group the sales by product and calculate the sum of the amount for each product. This allows us to analyze the sales performance for different products and make informed business decisions based on the total sales amount.

That concludes the 10 MySQL practice scenarios based on subquery questions. I hope you found them helpful in improving your MySQL skills. Remember, practice makes perfect, so keep practicing and exploring different scenarios to enhance your understanding of subqueries in MySQL.