Let’s assume we have daily weather records with the maximum temperature:
| day | max_temp_celsius |
|---|---|
| 2025-07-01 | 25 |
| 2025-07-02 | 27 |
| 2025-07-03 | 30 |
| 2025-07-04 | 32 |
| 2025-07-05 | 35 |
| 2025-07-06 | 29 |
| 2025-07-07 | 31 |
| 2025-07-08 | 31 |
| 2025-07-09 | 28 |
| 2025-07-10 | 22 |
Now, we want to find the longest heatwaves (the longest sequences of days with a maximum temperature >= 30 degrees) using SQL only. We can achieve this using multiple subqueries with window functions - here’s the complete SQL statement:
WITH numbered_records AS (
-- Assign row numbers to all records and to records meeting the condition
SELECT
day,
max_temp_celsius,
ROW_NUMBER() OVER (ORDER BY day) AS overall_row,
ROW_NUMBER() OVER (
PARTITION BY CASE WHEN max_temp_celsius >= 30 THEN 1 ELSE 0 END
ORDER BY day
) AS hot_or_cold_row
FROM weather
),
numbered_heatwaves AS (
-- Create group identifiers for consecutive sequences
SELECT
day,
max_temp_celsius,
overall_row - hot_or_cold_row AS heatwave_id
FROM numbered_records
WHERE max_temp_celsius >= 30
),
longest_heatwaves AS (
-- Count the length of each sequence
SELECT
heatwave_id,
MIN(day) AS start_date,
MAX(day) AS end_date,
COUNT(*) AS heatwave_length
FROM numbered_heatwaves
GROUP BY heatwave_id
)
-- Find the longest sequence
SELECT
start_date,
end_date,
heatwave_length
FROM longest_heatwaves
ORDER BY heatwave_length DESC;
I’ll explain the subqueries step by step now.
Subquery 1 defines (using window functions) a row number for all days, and a row number that counts records above and below 30 degrees separately:
WITH numbered_records AS (
SELECT
day,
max_temp_celsius,
ROW_NUMBER() OVER (ORDER BY day) AS overall_row,
ROW_NUMBER() OVER (
PARTITION BY CASE WHEN max_temp_celsius >= 30 THEN 1 ELSE 0 END
ORDER BY day
) AS hot_or_cold_row
FROM weather
)
Result:
| day | max_temp_celsius | overall_row | hot_or_cold_row |
|---|---|---|---|
| 2025-07-01 | 25 | 1 | 1 |
| 2025-07-02 | 27 | 2 | 2 |
| 2025-07-03 | 30 | 3 | 1 |
| 2025-07-04 | 32 | 4 | 2 |
| 2025-07-05 | 35 | 5 | 3 |
| 2025-07-06 | 29 | 6 | 3 |
| 2025-07-07 | 31 | 7 | 4 |
| 2025-07-08 | 31 | 8 | 5 |
| 2025-07-09 | 28 | 9 | 4 |
| 2025-07-10 | 22 | 10 | 5 |
Subquery 2 creates a unique ID for each sequence of hot days simply by subtracting hot_or_cold_row from overall_row:
numbered_heatwaves AS (
SELECT
day,
max_temp_celsius,
overall_row - hot_or_cold_row AS heatwave_id
FROM numbered_records
WHERE max_temp_celsius >= 30
)
| day | max_temp_celsius | heatwave_id |
|---|---|---|
| 2025-07-03 | 30 | 2 |
| 2025-07-04 | 32 | 2 |
| 2025-07-05 | 35 | 2 |
| 2025-07-07 | 31 | 3 |
| 2025-07-08 | 31 | 3 |
Subquery 3 is transforming the result of the second subquery into a form that makes it easy to retrieve the longest sequences:
longest_heatwaves AS (
SELECT
heatwave_id,
MIN(day) AS start_date,
MAX(day) AS end_date,
COUNT(*) AS heatwave_length
FROM numbered_heatwaves
GROUP BY heatwave_id
)
| heatwave_id | start_date | end_date | heatwave_length |
|---|---|---|---|
| 2 | 2025-07-03 | 2025-07-05 | 3 |
| 3 | 2025-07-07 | 2025-07-08 | 2 |
And here we are with the result of the complete query:
| start_date | end_date | heatwave_length |
|---|---|---|
| 2025-07-03 | 2025-07-05 | 3 |
| 2025-07-07 | 2025-07-08 | 2 |
Note that window functions require at least MySQL 8.0
We might explore more of this topic in the future and try to achieve the same with Oracle, Postgres and older MySQL versions…