ValuationAVMReal estate agents

What data sources do property valuations and AVMs use?

13 min read

Every property valuation is only as good as the data underneath it. Whether a figure comes from an instant automated valuation model (AVM) or an agent's hand-built comparative market analysis (CMA), it is built from the same raw material: records of what similar homes sold for, what is on the market now, and the attributes that make one property worth more than another. Agents and mortgage advisors who understand where that data comes from — and what it can't capture — read valuations more critically and explain them more convincingly. This guide walks through the data sources behind a property valuation, why the data layer decides accuracy, and a worked example of how the same house can land on two different numbers.

Analytics dashboards and charts on a laptop screen, representing the data sources behind a property valuation
Photo by Luke Chesser on Unsplash.

Sold prices: the foundation of every valuation

The most important input to any valuation is what comparable properties have actually sold for. Recorded transaction prices are the closest thing the market has to ground truth, because they reflect what a real buyer paid a real seller. Where this data lives varies by country: in many markets it sits in a public land registry or deeds office; in others it flows through the multiple listing service (MLS) that agents update when a deal closes. Either way, an AVM ingests these recorded sales automatically, while a human building a CMA selects a handful of them by hand. The freshness and completeness of this feed is the single biggest driver of accuracy — a model fed last month's sales in a busy market is far better informed than one leaning on year-old prices in a thin one. Choosing which of those sales genuinely count is its own discipline, covered in our guide on how to find comparable sales (comps).

Active and historic listings

Sold prices tell you where the market was; live and recent listings tell you where it is heading. Valuation data therefore includes what is currently for sale (the competition a seller faces), what has gone under offer, and what was listed but failed to sell or was withdrawn. Asking prices are not sale prices — sellers and their agents can be optimistic — so this layer is directional rather than definitive. But it captures momentum that closed sales miss: if three similar homes just listed below recent sold comps, the market is softening, and a valuation that ignores that signal will lag reality. AVMs that blend listing data with sold data react faster to turning markets than those relying on completed transactions alone.

Property attributes and public records

To compare two homes, a model needs to know what each one is: floor area, number of bedrooms and bathrooms, plot size, year built, property type and sometimes energy rating. This attribute data usually comes from public records — assessment or tax rolls, cadastral records, building registers — supplemented by what agents enter into listing systems. It is also the layer most prone to being stale or simply wrong: an extension built years ago may never have been recorded, a loft conversion may not show in the official floor area, and a misfiled bedroom count quietly distorts every comparison. A valuation is only as accurate as these attributes, which is why one of the highest-value things an agent does is correct the record from what they see on site.

Geospatial and neighbourhood data

Location is famously the dominant driver of value, and valuations encode it through geospatial data: precise coordinates, distance to schools, transport and amenities, neighbourhood boundaries, and increasingly risk layers such as flood zones or noise contours. This is what lets a model understand that two physically identical houses a mile apart can differ sharply in value. The sophistication here separates a crude model from a strong one — a basic AVM might treat a whole postcode as uniform, while a better one recognises that one side of a street backs onto a park and the other onto a ring road. For agents, this layer is also where local knowledge beats any dataset, because the lived reality of a micro-location is rarely fully captured in coordinates.

Condition and renovation data: the layer AVMs miss

Here is the fault line between automated and human valuation. Interior condition, the quality of a renovation, layout flow, light, and kerb appeal are rarely recorded in any structured dataset — so an AVM is effectively blind to them. It knows the house has three bedrooms and 95 square metres, not that the kitchen was gutted last year or that the place needs rewiring. A human valuation captures exactly this, from a site visit or from photos, and adjusts for it. This single gap explains most of the divergence between an instant estimate and an agent's number, and it is the heart of the trade-off explored in AVM vs CMA: which to use. The broader question of how far to trust any algorithmic figure is covered in how accurate online home value estimates really are.

A worked example: same house, two data sets

Suppose two tools value the same three-bedroom terraced house, renovated eighteen months ago. All figures are illustrative, to show how data drives the number rather than any real market.

  • Tool A leans on public records and the last recorded sale (pre-renovation), with comps drawn from a wide radius to fill gaps in a thin local market. It returns €336,000, anchored to the home's older profile and some loosely similar sales.
  • Tool B uses a fresher sold feed, a tighter comp radius, and updated attributes that include the converted loft. It starts from three close, recent sales at €352,000, €358,000 and €366,000 and returns €359,000.
  • The gap — about €23,000, or 7% — is not random error. It is the difference between two data sets: stale attributes and distant comps versus current ones, plus the renovation only the second tool's updated record reflected.

The lesson is that the number you trust is the one whose data you can see. If you can inspect the comps, the recency and the attributes behind a figure, you can judge it; if you can't, you are trusting a black box. Turning a defensible valuation into an asking-price decision is the next step — see how to price a listing.

Why the data layer decides accuracy — and how to work with it

Step back and a pattern emerges: model sophistication matters, but data quality matters more. A brilliant algorithm fed stale or incomplete sold prices will lose to a simpler one fed fresh, complete records. That is why the same AVM can be reliable in a dense, well-documented city and shaky in a rural market where sales are sparse and privately reported. For agents and advisors, the practical takeaways are concrete: prefer recent, close, genuinely similar comps; verify the attributes a valuation rests on, especially floor area and any unrecorded works; treat asking prices as signals, not facts; and always add the condition layer no dataset contains. Do those four things and you are valuing on better data than most tools start with.

Pulling the data layers together

The friction in all of this is assembly. Sold prices live in one place, listings in another, attributes in a public register, geospatial context somewhere else — and stitching them into a coherent picture by hand is the hour that makes a thorough valuation feel expensive. This is the work Biedradar automates: you enter an address and it gathers the comparable sales, recent listings and market signals into one place, then generates a clean, branded valuation report in minutes. You still supply the judgement the data can't — which comps are truly comparable, how to adjust for the renovation the record never captured, and what the micro-location really means. The tool does the data assembly at machine speed; you bring the condition and local read that turns assembled data into a number a client will trust. For an agent or advisor who values several properties a week, having the data layers pre-assembled — with the comps visible so you can defend the figure — is what makes a fast valuation a credible one.

Frequently asked questions

What data sources do property valuations and AVMs use?

Most valuations and AVMs draw on the same core layers: recorded sold prices (from a land registry, deeds office or MLS), current and historic listings, property attributes (bedrooms, floor area, plot, year built) from public records or assessment rolls, geospatial data (location, schools, transport, flood risk), and — for a human valuation — condition and renovation detail gathered on site or from photos. An AVM uses the recorded layers automatically; a CMA adds the condition layer a human can see.

Where do AVMs get their sold price data?

From whatever authoritative record of completed transactions exists in that market: a public land registry or deeds office in many countries, or the MLS where agents log sales. The completeness and freshness of that feed is the single biggest driver of AVM accuracy. In markets with prompt, public sold data, models are well fed; where sales are private or reported with a long lag, the model is guessing from older or thinner evidence.

Why do two valuation tools give different values for the same house?

Usually because they are standing on different data. One tool may have a fresher sold feed, a wider comp radius, more property attributes, or a different way of handling missing fields. None of them can see the new kitchen or the railway line at the bottom of the garden unless a human records it. Different inputs plus different model assumptions produce different numbers — which is why the data behind a figure matters as much as the figure itself.

What data can an AVM not see?

Anything that was never recorded: interior condition, quality of a renovation, layout flow, natural light, noise, smells, a difficult neighbour, or works done without permits. AVMs also struggle where sold data is sparse — unique, rural or thinly traded homes. These blind spots are exactly where a human CMA earns its keep, because an agent on site captures the condition and micro-location data the model never had.

How can agents check the data behind a valuation?

Look at the comps. Ask how recent the sold prices are, how close and how similar the comparable properties are, and whether the attributes (floor area, bedrooms, plot) match the subject. A defensible valuation shows its evidence: the specific sales it leaned on and the adjustments made. If a tool gives a number but won't show the comps, treat it as a starting point to verify, not a conclusion.