Using dataclasses for configuration

What are dataclasses?

Introduced in Python 3.7 dataclasses are normal Python classes with some extra features for carrying around data and state. If you find yourself writing a class that is mostly attributes it's a dataclass.

Dataclasses have some other nifty features out of the box such as default dunder (double underscore) methods, type hinting, and more.

For more information checkout the docs.

Dataclasses as configuration objects

Recently I've had the opportunity to work on a couple of Python 3.7 projects. In each of them I was interacting with many databases and API Endpoints. Towards the beginning of one of the projects I did something like this:

elastic_config = {"user": os.environ["ES_USER"],
                  "endpoint": os.environ["ES_ENDPOINT"],
                  ...
                  }

When I checked in the code I had been working on one of the reviewers commented that this pattern was normal, but since we were using 3.7 let's use a dataclass.

import os
from dataclasses import dataclass

@dataclass
class ElasticConfiguration:
    user: str = os.environ["ES_USER"]
    endpoint: str = os.environ["ES_ENDPOINT"]
    ...

Makes sense, but what's the practical benefit? Before I wasn't defining a class and carrying around the class model that I'm not really using.

  1. Class attribute autocomplete. I can't tell you how many times I used to check if I had the right _, casing, abbreviation etc for the key I was calling. Now it's a class attribute, no more guessing.
  2. Hook up mypy and find some interesting errors.
    • Above you'll notice I used os.environ[]. A lot of people like to use an alternative .get(<key>) pattern with dictionaries. The problem is often times a default of None gets supplied and you're dealing with Optional[T], but still acting like it's str everywhere in your code.
  3. __post_init__
    • Dataclasses have an interesting method called __post_init__ that gets called by __init__. On configuration objects this is a handy place to put any validation function/method calls you might build around attributes.
  4. Subjectively elastic.user is faster to type, and more appealing to the eyes than elastic["user"].

So the next time you find yourself passing around configuration information remember dataclasses may be a useful and productive alternative to passing around a dictionary.

Additional Resources

Beyond the docs here are some links I found useful when learning about Python dataclasses.

Contact

If you want to reach out feel free to follow up via email or on GitLab.