Your TMS Data Quality Is the Ceiling for Any AI Routing Model You Deploy

Logistics data quality assessment for TMS and ML systems

The single factor that most consistently predicts whether an AI-based carrier matching or route optimization system will work well in a mid-market freight brokerage is not the algorithm architecture, the model complexity, or the vendor's feature set. It's the quality and completeness of the historical TMS data the model is trained on. Before you evaluate routing AI vendors, you should evaluate your own TMS data. The results will be instructive.

What "Data Quality" Means in a Freight TMS Context

Data quality in a freight TMS isn't a single metric — it's a set of specific field-level completeness and accuracy requirements that determine whether a machine learning model can build valid signals from the historical data. The fields that matter most for carrier matching and route optimization models are the ones that capture the key dimensions of the carrier-load relationship: who moved the freight, where, when, at what cost, and with what service outcome.

The most common data quality failures in mid-market TMS environments fall into three categories: missing values in fields the model depends on, inconsistent identifiers that prevent accurate carrier record linkage, and incorrect or approximate values in fields that are filled in but not verified. Each category has a different impact on model performance and requires a different fix.

It's worth being specific about which fields the model depends on, because "TMS data quality" is abstract until you're looking at actual field-level completeness rates. For a carrier matching model, the high-impact fields are: carrier MC number (for FMCSA data linkage), load pickup timestamp, load delivery timestamp, origin and destination zip codes, loaded miles, commodity type, equipment type, all-in carrier cost (including accessorials), and tender status (accepted/rejected). Any of these fields with less than 85% completeness across your historical load data will noticeably degrade the model's ability to build accurate carrier performance signals.

The Timestamp Problem

Missing or inaccurate pickup and delivery timestamps are the most impactful data quality issue in freight TMS data for machine learning purposes. On-time performance (OTP) is a primary signal in carrier scoring — but OTP can only be calculated if you have both the scheduled appointment time and the actual pickup/delivery time. If actual timestamps are missing or have been entered inaccurately (backfilled from memory, entered as the appointment time rather than the actual time), OTP metrics are either missing or wrong.

The rate of timestamp problems in mid-market TMS data is often surprising. In many brokerage environments, actual pickup and delivery timestamps are only entered into the TMS when the shipper or consignee reports a problem (late pickup complaint, delivery dispute). Loads that move without issues may have no actual timestamps entered — just appointment times. The result is a dataset where OTP appears perfect (no late events recorded) but is actually incomplete (no actual events recorded). A model trained on this data learns that carriers always perform on time, which is a systematically wrong prior.

The fix requires a process change, not just a technical one. Dispatchers need to enter actual pickup and delivery timestamps for every load, not just problem loads. This is easier when the TMS interface prompts for it and when there's a workflow requirement (load can't be settled without confirmed delivery time, for example). Retroactively cleaning historical timestamps is generally not worth the effort; focus on capturing correct data going forward and establishing the model's training window starting from when data quality improved.

Carrier ID Consistency: The Silent Model Killer

Carrier matching models work by building a performance profile for each carrier. The profile is only valid if all loads from a given carrier are correctly attributed to that carrier's record. When the same physical carrier entity has multiple records in the TMS — entered under different name variations, different MC numbers, or created separately by different dispatchers — the loads are split across multiple records, and each record has too few observations to build a reliable performance signal.

This is the carrier deduplication problem, and it's endemic in mid-market TMS environments that have been operating for more than a few years. A carrier who was onboarded under "Smith Trucking LLC" and then re-entered as "Smith Transport" when a dispatcher couldn't find the original record has two records with half the loads each. The model sees two carriers with moderate confidence, when in fact it should see one carrier with twice as many observations and a more reliable performance signal.

The carrier dedup exercise before deploying any ML model requires: extracting all carrier records from the TMS, running MC/DOT number matching to identify records that share an authority number, reviewing name-similarity matches for records without consistent authority numbers, and merging duplicate records with their load history consolidated. This is a one-time data quality project that typically takes 20–40 hours depending on the size of the carrier panel and the extent of the duplication problem. It's worth doing before training any model, not after.

Accessorial Records and Incomplete Cost Data

The total carrier cost for a load includes linehaul, fuel surcharge, and any applicable accessorial charges — detention, layover, liftgate, sort and segregate, residential delivery, fuel stop charges. In many TMS environments, accessorial charges are either not recorded as separate line items (they're folded into the linehaul rate as negotiated adjustments) or are recorded inconsistently (sometimes as separate line items, sometimes as notes, sometimes not at all).

For a route optimization or margin analysis model, incomplete accessorial records mean the cost per load is systematically understated on loads with significant accessorial activity. The model learns that certain lanes or carriers are more profitable than they actually are, because the cost it's seeing doesn't include the detention hours or liftgate charges that made the actual load margin worse.

The most common accessorial problem is detention. Detention costs in mid-market brokerage are frequently absorbed as a relationship cost rather than billed back to the shipper or deducted from the carrier invoice — and even when they are handled, they often don't appear as clean line items in the TMS. A systematic review of detention handling procedures and how they're captured in TMS records is worth doing before deploying a model. The lanes where detention is most common (manufacturing facilities with tight receiving windows, distribution centers with appointment delays) will show the largest distortion in cost data if detention isn't captured.

Commodity and Equipment Type Fields

Route optimization models that account for equipment type (dry van, reefer, flatbed, step deck, LTL) and commodity category require these fields to be populated and consistent. In practice, many TMS environments have these fields partially populated — filled in when the load type is unusual but left blank when it's the "default" equipment type for the brokerage's primary commodity. This creates a dataset where the default equipment type appears to have zero records, while all the actual loads are attributed to it.

Commodity type fields are particularly prone to inconsistency. "General merchandise," "consumer goods," "retail merchandise," and "miscellaneous freight" may all describe the same commodity category entered by different dispatchers using different terms. Without normalization, the model sees five separate commodity categories each with a fraction of the observations that a single normalized category would have.

Data dictionary standardization — a defined set of valid values for equipment type, commodity category, and service type fields, with validation rules in the TMS that prevent free-text entry — prevents this problem from compounding going forward. Retroactive normalization of existing data requires a mapping exercise that's time-consuming but usually achievable through pattern matching and manual review of the most common free-text values.

What a Data Quality Assessment Actually Looks Like

Before deploying HaulCortex or any carrier matching system into a mid-market brokerage, we run a data quality assessment on the customer's TMS export. The assessment covers: field-level completeness rates for the 12 highest-impact fields, carrier record deduplication rate (what percentage of MC numbers appear in more than one carrier record), timestamp accuracy rate (what percentage of loads have both appointment and actual timestamps), and accessorial completeness rate (what percentage of loads have accessorial charges recorded when the lane type suggests accessorials are likely).

The output is a readiness score that predicts model accuracy at deployment, plus a prioritized list of data quality improvements that would most improve model performance. Some improvements can be made quickly (require dispatchers to enter actual timestamps going forward). Others take time (retroactive carrier dedup). We stage the deployment timeline around the data quality improvement plan, not around the software installation.

As we explored in our article on McLeod TMS integration, the right question before any TMS integration is not just "can we connect?" but "what will the data tell us once we do?" A technically successful integration that imports poor-quality data produces confident-looking output from a model that has learned from bad training data — which is a worse outcome than no model at all.

The Minimum Data Quality Bar for Useful ML Outputs

The field-level completeness thresholds below which ML carrier matching models become unreliable in production: carrier MC number completeness below 80%, actual pickup timestamp completeness below 70%, actual delivery timestamp completeness below 70%, or all-in carrier cost completeness below 85%. Any of these below threshold means the model's carrier performance signals will be too noisy to trust for operational dispatch recommendations.

This doesn't mean you can't deploy until you reach these thresholds — it means you should deploy with appropriate expectations about model accuracy in the first months, implement data capture improvements immediately, and plan for a model retraining cycle once data quality crosses the thresholds. The model will improve as data quality improves, but only if the data quality improvement is actually happening.

Get a TMS Data Quality Assessment

Before deploying HaulCortex, we assess your TMS data quality and tell you exactly what the model will be able to learn from your load history. Request a data readiness assessment as part of the demo.

Request a Demo