Robin van der Vleuten

How Erlang's parser tools saved my DSMR library

I've always been the type of developer who starts with the simplest solution.

"I'll just use a few regex patterns here" is a sentence that ages badly. Before long I had a tangled mess of string operations that technically worked, until it didn't. That is exactly what happened with my DSMR library.

Since I started working with smart meter data, I've been fascinated by the DSMR protocol (Dutch Smart Meter Requirements) used across the Netherlands, Belgium, and Luxembourg. These meters broadcast "telegrams" every few seconds: structured data packets that look something like this:

/KFM5KAIFA-METER
1-3:0.2.8(42)
0-0:1.0.0(161113205757W)
1-0:1.8.1(001581.123*kWh)
1-0:2.8.1(000000.000*kWh)
1-0:21.7.0(00.170*kW)
0-1:24.2.1(161129200000W)(00981.443*m3)
!6796

My first approach was a chaotic collection of regex patterns and string splits that I somehow convinced myself was maintainable. It worked until edge cases started showing up.

After battling mysterious parsing failures, I remembered something obvious: people have been solving parsing problems for decades. That's when I found Erlang's lexical analyzer, leex, and parser generator, yecc, hiding in plain sight.

Parser generators

Instead of writing imperative code that says "read character by character and if you see this pattern do that," these tools let you describe what valid input looks like. They handle the scanning and parsing.

The lexer's job is breaking the raw text into meaningful tokens. In my src/dsmr_lexer.xrl file, I defined patterns using regular expressions:

erlang
% OBIS code pattern
{DIGIT}-{DIGIT}:{DIGIT}\.{DIGIT}\.{DIGIT} :
{token, {obis, TokenLine, extract_obis_code(TokenChars)}}.
% Timestamp
{DIGIT}{12}[SW] :
{token, {timestamp, TokenLine, extract_timestamp(TokenChars)}}.

What I like about this approach is the clarity. No nested conditionals checking characters one by one. I describe the shape of valid tokens, and the lexer handles the scanning. The syntax took some getting used to, but the trade-off was worth it.

Once the lexer turns raw text into tokens, the parser needs to understand how those tokens fit together. That's where yecc comes in. In my src/dsmr_parser.yrl, I defined grammar rules:

erlang
telegram -> header lines checksum :
build_telegram('$1', '$2', '$3').
line -> obis attributes :
map_obis_to_field('$1', '$2').
attributes -> attribute attributes : ['$1' | '$2'].
attributes -> attribute : ['$1'].

These rules describe how a valid DSMR telegram is structured. The parser walks through the token stream, applies these rules, and builds up the final data structure.

Making it friendlier

One thing that was important to me was making the library intuitive. OBIS codes like 1-0:1.8.1 are standardized but not exactly human-readable. I wanted developers using my library to work with friendly field names like :electricity_delivered_1 instead.

I centralized all the mappings in a dedicated module:

elixir
defmodule DSMR.OBIS do
def to_field_name([1, 0, 1, 8, 1]), do: :electricity_delivered_1
def to_field_name([1, 0, 2, 8, 1]), do: :electricity_returned_1
def to_field_name([0, 1, 24, 2, 1]), do: :gas_consumption
# ...and many more
end

OBIS codes that are not in the mapping table, maybe proprietary extensions or newer codes I have not seen yet, get collected in an unknown_fields list instead of causing a parse error. That matters in real usage because meters sometimes report unexpected things.

Why this approach won me over

Looking back, leex and yecc helped because they forced the parser into separate jobs.

When I need to support a new DSMR version or add fields, I update a pattern in the .xrl file and add a grammar rule. I do not have to hunt through nested conditionals or string operations.

yecc gives me line numbers when something goes wrong. That is much better than a cryptic failure somewhere in a chain of string operations.

The generated state machines are fast enough for batches of telegrams, but the bigger win is that each piece has a clear responsibility. The lexer handles what tokens look like. The parser handles how they fit together. The OBIS module handles what they mean.

There is something satisfying about using tools that have been around since the 1970s. They are not exciting or new, but they work. Sometimes that is exactly what I want.

The result

The DSMR library is now open source at github.com/mijnverbruik/dsmr and it cleanly parses telegrams from old and new model smart meters. It handles multiple versions, three-phase connections, various MBus devices, and all sorts of edge cases I didn't anticipate at the start.

The biggest lesson for me was: don't reinvent wheels, especially old ones. Parser generators exist because parsing is genuinely hard. I could have spent weeks debugging edge cases in a hand-written parser. Instead, I spent time learning leex and yecc, and ended up with something more maintainable.

It also reminded me that the BEAM ecosystem has a lot of sturdy tools that Elixir developers sometimes overlook. Erlang's standard library is full of things like this: not always well documented, but solid.

Should I use leex and yecc for every parsing problem? Probably not. But for protocols like DSMR, where the grammar is complex and well defined, they have been the right tools.

It is worth trying older tools when they fit the problem, even if they do not have the shiniest docs or the newest GitHub stars.