How Erlang's parser tools saved my DSMR library
- Published
I've always been the type of developer who starts with the simplest solution. You know how it goes - you think "I'll just use a few regex patterns here" and before you know it, you're knee-deep in a tangled mess of string operations that technically works... until it doesn't. That's exactly what happened with my DSMR library.
Since I started working with smart meter data a while back, I've been fascinated by the DSMR protocol (Dutch Smart Meter Requirements) used across the Netherlands, Belgium, and Luxembourg. These meters broadcast what they call "telegrams" every few seconds - essentially structured data packets that look something like this:
/KFM5KAIFA-METER1-3:0.2.8(42)0-0:1.0.0(161113205757W)1-0:1.8.1(001581.123*kWh)1-0:2.8.1(000000.000*kWh)1-0:21.7.0(00.170*kW)0-1:24.2.1(161129200000W)(00981.443*m3)!6796
My first approach? A chaotic collection of regex patterns and string splits that I somehow convinced myself was maintainable. We've all been there, right? That moment when you realize your solution is held together with hope and digital duct tape.
After battling with edge cases and mysterious parsing failures, I remembered something important: people have been solving parsing problems since before I was born. There had to be better tools out there. That's when I discovered Erlang's lexical analyzer (leex) and parser generator (yecc) hiding in plain sight.
The Revelation of Parser Generators
Instead of writing imperative code that says "read character by character and if you see this pattern do that," these tools let you describe what valid input looks like, and they handle the rest. It's a completely different mindset - declarative rather than imperative.
The lexer's job is breaking the raw text into meaningful tokens. In my src/dsmr_lexer.xrl
file, I defined patterns using regular expressions:
erlang
% OBIS code pattern{DIGIT}-{DIGIT}:{DIGIT}\.{DIGIT}\.{DIGIT} :{token, {obis, TokenLine, extract_obis_code(TokenChars)}}.% Timestamp{DIGIT}{12}[SW] :{token, {timestamp, TokenLine, extract_timestamp(TokenChars)}}.
What I love about this approach is how clear it is. No more nested conditionals checking character by character - I just describe the shape of valid tokens, and the lexer handles the scanning. Got some headaches along the way figuring out the syntax, but the clarity was worth it.
Once the lexer turns raw text into tokens, the parser needs to understand how those tokens fit together. That's where yecc comes in. In my src/dsmr_parser.yrl
, I defined grammar rules:
erlang
telegram -> header lines checksum :build_telegram('$1', '$2', '$3').line -> obis attributes :map_obis_to_field('$1', '$2').attributes -> attribute attributes : ['$1' | '$2'].attributes -> attribute : ['$1'].
These rules describe how a valid DSMR telegram is structured. The parser walks through the token stream, applies these rules, and builds up the final data structure.
Making It User-Friendly
One thing that was important to me was making the library intuitive. OBIS codes like 1-0:1.8.1
are standardized but not exactly human-readable. I wanted developers using my library to work with friendly field names like :electricity_delivered_1
instead.
I centralized all the mappings in a dedicated module:
elixir
defmodule DSMR.OBIS dodef to_field_name([1, 0, 1, 8, 1]), do: :electricity_delivered_1def to_field_name([1, 0, 2, 8, 1]), do: :electricity_returned_1def to_field_name([0, 1, 24, 2, 1]), do: :gas_consumption# ...and many moreend
For OBIS codes that aren't in the mapping table - maybe proprietary extensions or newer codes I haven't seen yet - they get collected in an unknown_fields
list instead of causing a parse error. This turned out to be crucial for real-world usage where meters sometimes report unexpected things.
Why This Approach Won Me Over
Looking back, using leex and yecc solved several problems I would have hit with a hand-rolled parser:
Maintainability - When I need to support a new DSMR version or add new fields, I update a pattern in the .xrl
file and add a grammar rule. No hunting through nested conditionals or string operations.
Error reporting - yecc gives me line numbers when something goes wrong. That's much better than a cryptic failure somewhere in a chain of string operations.
Performance - These tools generate optimized state machines. For parsing thousands of telegrams, that matters.
Separation of concerns - The lexer handles what tokens look like. The parser handles how they fit together. The OBIS module handles what they mean. Each piece is testable on its own.
And honestly, there's something satisfying about using tools that have been around since the 1970s. They're not exciting or new, but they work. They've been used to parse everything from programming languages to network protocols. Sometimes the old solutions are still the best ones.
The Result
The DSMR library is now open source at github.com/mijnverbruik/dsmr and it cleanly parses telegrams from old and new model smart meters. It handles multiple versions, three-phase connections, various MBus devices, and all sorts of edge cases I didn't anticipate at the start.
The biggest lesson for me was: don't reinvent wheels, especially old ones. Parser generators exist because parsing is genuinely hard. I could have spent weeks debugging edge cases in a hand-written parser. Instead, I spent time learning leex and yecc, and ended up with something more maintainable.
It also reminded me that the BEAM ecosystem has a lot of industrial-strength tools that Elixir developers sometimes overlook. Erlang's standard library is full of things like this - not always well-documented, but incredibly solid.
Should I use leex and yecc for every parsing problem? Probably not. But for protocols like DSMR where the grammar is complex and well-defined? It's been the right tool. The library works, it's maintainable, and I learned something valuable along the way.
Not that hard right? It's always worth experimenting with tools that have stood the test of time, even if they don't have the shiniest marketing pages or the most recent GitHub stars. Sometimes the best solutions are the ones that have been quietly solving problems for decades.