Why a Dutch postcode regex is not enough
You search for "regex postcode Nederland" because you need the small answer.
A Dutch postcode is four digits and two letters. Maybe there is a space in the middle, maybe there isn't. You do not want to pull in a library just to check 1012 AB.
So you write the obvious pattern:
rubyDUTCH_POSTCODE = /\A[1-9][0-9]{3}\s?[A-Z]{2}\z/iDUTCH_POSTCODE.match?("1012 AB") # trueDUTCH_POSTCODE.match?("1012AB") # trueDUTCH_POSTCODE.match?("0123 AB") # false
For that one narrow question, this is fine.
A regular expression can tell you whether a string looks like a Dutch postcode. The trouble starts when that tiny check slowly turns into "address validation."
A postcode regex validates syntax
The common Dutch postcode pattern is not wrong. If you search for a Dutch postcode regex, you will usually find something close to this:
ruby\A[1-9][0-9]{3}\s?[A-Z]{2}\z
It checks a shape:
- the first digit is not zero
- then three more digits
- then an optional space
- then two letters
Useful. Also very limited.
9999 ZZ has the right shape. That does not mean a parcel can be delivered there. It does not tell you whether the postcode belongs to the city the user entered, or whether the street and house number make sense together.
It says even less about the rest of the address: which fields are required, what the postal code is called, whether the country uses postal codes at all, and how the address should be printed later.
The address is bigger than the postcode
The first version usually starts with one country.
For the Netherlands, you add a postcode regex. Then you add a street-and-house-number regex. Then a customer enters 1e Jacob van Campenstraat 10, Burg. van der Werffstraat 1, or an address line with a suffix you did not think about.
So the regex grows.
Then the app opens up to Belgium. Germany. The United Kingdom. Ireland, where "county" might be the label. Japan, where the fields appear in a different order. Countries without postal codes at all.
There is no global postal code regex hiding somewhere that makes this go away. Countries use different lengths, characters, separators, and rules. Even country-specific regexes only answer the pattern question. They do not give you the address format around it.
Use address data instead
This is why I built Addressing, a Ruby gem for country-aware address formats, labels, formatting, and validation.
I wrote about the bigger picture in Handle international addresses in Ruby. This post is the smaller version of the argument: if your validation starts with a postcode regex, the next step is probably not a bigger regex. It is address data.
Addressing knows the Dutch postal code pattern, but it keeps that pattern inside the address format for NL:
rubyformat = Addressing::AddressFormat.get("NL")format.required_fields# => [:address_line1, :postal_code, :locality]format.label_for(:postal_code)# => "Postal code"format.postal_code_pattern# => "\\d{4}\\s?[A-Z]{2}"
Now the postcode rule lives next to the rest of the country-specific rules. Your application asks one object which fields matter, which labels to show, and which postal code pattern to apply.
That boundary has saved me from scattering regex constants through forms, models, and import scripts. The pattern still exists. It just no longer pretends to be the whole address system.
Validation in Rails
If you store addresses on an Active Record model, the validator is deliberately boring:
rubyclass CustomerAddress < ApplicationRecordvalidates_address_formatend
The validator uses the country code on the address and applies the matching format rules. A Dutch address gets Dutch required fields and the Dutch postal code pattern. A US address gets a ZIP code pattern.
And a country without postal codes should not be forced through a postcode field because your form happened to start in the Netherlands.
That is the useful shift. You are no longer asking, "Does this string match the pattern I remembered to copy?" You are asking, "Does this address match the rules for its country?"
What it does not solve
Addressing is not a postal authority and it is not an address verification service.
It can validate the format. It can tell you which fields a country uses. It can format the address for display or postal labels. It keeps the country-specific details close to one API.
It cannot promise that a postcode exists today, that a house number is deliverable, or that the resident still lives there. For that you need a postal database, carrier API, geocoding service, or country-specific lookup.
I like keeping those jobs separate:
- use a regex for fast feedback on one field
- use Addressing for the country-aware rules around the address
- use a real verification source when failed delivery costs money
Do not make one regex pretend to do all three jobs.
A better default
The Dutch postcode regex is not the problem. The problem is using it as the place where address knowledge accumulates.
That is how you end up with a pattern nobody wants to touch, copied across projects, rejecting valid customers and accepting fake addresses with the confidence of a green test suite.
Start with the country instead. Let the country pick the address format. Let the postcode rule be one small part of that format.
If you are building this in Ruby, try Addressing on GitHub and add it to your Gemfile:
rubygem "addressing"
A regex can tell you whether 1012 AB looks like a Dutch postcode.
For address validation, I want the rest of the rules nearby too.