FARS DRUNK_DR Returns 0 in 2021+ - Why and How to Fix It

If you query SUM(DRUNK_DR) on the ACCIDENT table for 2021, 2022, or 2023, you'll get zero. Not because there were no drunk drivers - because NHTSA quietly removed the column.

The symptom

Your dashboard shows drunk-driver fatal crashes dropping from ~10,000/year to zero overnight in 2021. You assume MADD finally won. Then you check the news, and 2021 was actually one of the worst years on record for impaired driving fatalities. Something is wrong with your pipeline.

The query that broke:

SELECT year, SUM(DRUNK_DR) AS drunk_drivers
FROM accident
GROUP BY year;

Results:

YearSUM(DRUNK_DR)Reality (derived from VEHICLE.DR_DRINK)
20179,5819,581
20188,9938,993
20198,9048,904
20209,8739,873
2021010,785
2022010,654
202309,975

The 2021+ counts above are real numbers we cross-checked against NHTSA's own VEHICLE.csv source files (verified row-for-row in our ingestion pipeline). They're the highest impaired-driver years on record, not zero.

The root cause

Starting with the 2021 data year, NHTSA removed the DRUNK_DR column from ACCIDENT.csv entirely. We verified this by checking the column headers across years:

YearDRUNK_DR in ACCIDENT.csv?
2017-2020Present (sum is real)
2021-2023Removed entirely

This is not documented as a breaking change anywhere obvious in NHTSA's release notes. The 2023 FARS Analytical User's Manual simply doesn't list DRUNK_DR on the ACCIDENT table. If you're not actively diffing column headers between years (and almost nobody is), your pipeline will read it as null/zero and silently produce wrong results.

The same fate befell HIT_RUN, SP_LIMIT, and SUR_COND on the ACCIDENT table - all removed in the 2020/2021 transition. See our general FARS data guide for the full list.

The fix: derive from VEHICLE.DR_DRINK

The data still exists - it just lives on the VEHICLE table now, as a per-vehicle flag. The "drunk-driver crashes" count is the number of crashes where at least one vehicle has DR_DRINK = 1:

SELECT
    a.year,
    COUNT(DISTINCT a.st_case) AS drunk_driver_crashes
FROM accident a
JOIN vehicle v ON v.st_case = a.st_case AND v.year = a.year
WHERE v.dr_drink = 1
GROUP BY a.year;

Or, if you want the per-crash count of drunk drivers (not just whether any were involved):

SELECT
    a.year, a.st_case,
    SUM(CASE WHEN v.dr_drink = 1 THEN 1 ELSE 0 END) AS drunk_drivers
FROM accident a
JOIN vehicle v ON v.st_case = a.st_case AND v.year = a.year
GROUP BY a.year, a.st_case;

This works for every year from 2017 through 2023. We verified the derived counts match the published DRUNK_DR exactly for 2017-2020 (when both fields are present), confirming the derivation is the right reconstruction of NHTSA's logic.

If you need cross-year data with drunk-driver counts

Build your pipeline to compute drunk_drivers at ingest time from VEHICLE.DR_DRINK for every year, including the years where DRUNK_DR is still present. Don't conditionally use one source for some years and another source for others - that's how you get hard-to-debug discrepancies.

If you'd rather not deal with this at all: FARS API normalizes drunk_drivers as a crash-level field across all 7 years (2017-2023). Query /v1/crashes?drunk_driver=true&year_from=2021 and you'll get the actual ~30,000 drunk-driver fatal crashes from 2021-2023, not zero.

Related issues you'll hit on the same pipeline

Verification

The numbers in this guide come from cross-checking our production database against NHTSA's source CSV files for every year 2017-2023. Specifically: for each year, we sum DR_DRINK = 1 rows in VEHICLE.csv grouped by ST_CASE, and verify the count matches what's in our crashes.drunk_drivers column. All 7 years match exactly. Source: NHTSA FARS download portal.

Try the API Full Data Guide