Config Parser#

This tool provides a flexible framework to load and merge configurations from different sources for Python projects. A tag style syntax is introduced to control the loading or merging behavior.

Quick Start#

Write configs#

A config is any file that can be deserialized into a python dictionary. Common formats including .yaml (.yml), .json and .toml are supported out of the box. Other formats may require a File deserializer.

Tag syntax extends the existing serialization languages to support complicated control flows or python specific features, e.g. include directive, variable definition, python object initialization, etc. An example is available in advanced-yaml-config.

Load configs#

An instance of ConfigParser is used to load config files and parse the tags. The default setup is sufficient for most use cases, while Customization is possible through the arguments of __init__(). Each __call__() will create a new context to maintain local variables and return the parsed configs in a single dictionary.

The parsing is performed in two passes:

  • Deserialize the files into dictionaries.

  • Apply the tags.

The order of the paths provided to __call__() and the order of keys and items from the first pass are preserved to the final output. The tags are parsed recursively from innermost to outermost.

Tag#

Syntax#

  • A tag is defined as a key-value pair given by <tag_key=tag_value> or <tag_key> if the tag value is None. Newlines are not allowed within a tag.

  • Arbitrary number of tags can be attached to a key.

  • The spaces and newlines between the key and tags are optional.

Precedence#

  • The tags are parsed from left to right based on the order of appearance.

  • The same tag can be applied multiple times.

  • The parsing result is order dependent.

  • Some of the built-in tags follow special rules:

    • <code> have the highest precedence and will only be parsed once.

    • The following tags will not trigger any parser.

    • The order of the following tags are ill-defined, as they are not supposed to simply modify the key-value pairs. As a result, they cannot be directly chained with other regular tags, unless through <code>.

URL and IO#

Both the ConfigParser and built-in tags <include>, <file> shares the same IO mechanism.

The file path is described by a standard URL accepted by urlparse() with the format:

[scheme://netloc/]path[;parameters][?query][#fragment]
  • scheme://netloc/ can be omitted for local path.

  • ;parameters is never used.

  • ?query can be used to provide additional key-value pairs. If a key appears multiple times, all values will be collected into a list. Values are interpreted as JSON strings.

  • #fragment is a dot-separated path, allowing to access nested dictionaries or lists. Similar to TOML’s table, double quotes can be used to escape the dot.

  • The percentage-encoding rule (%XX) is supported in the path to escape special characters.

Warning

The #fragment is extracted before any parsing.

File IO is handled by fsspec.open() and the deserialization is handled by ConfigParser.io, an instance of FileLoader.

  • The compression format is inferred from the last extension, see fsspec.utils.compressions.

  • The deserializer is inferred from the longest registered extension that does not match any compression format.

  • The deserialized objects will be catched, and can be cleared by ConfigParser.io.clear_cache.

Special#

nested=True in ConfigParser#

The nested=True (default) option enables a behavior similar to TOML’s table, where the dot-separated keys will be interpreted as accessing a nested dictionary and the parents will not be overriden. Use double quotes or <literal> to escape the keys with dot.

None key#

Besides the standard rules, both ~ and empty string in the key will be parsed into None.

Apply to list elements#

When the element is a dictionary and the only key is None, the element will be replaced by its value. Use <literal> to retain the original dictionary.

Built-in tags#

<code>#

This tag will replace the value by the result of eval(). The variables defined with <var> are available as locals.

value

  • str: a python expression

<include>#

This tag allows to merge dictionaries from other config files into the given level and will be parsed under the current context.

tag

  • <include>: the type of the paths will be inferred.

  • <include=absolute>: resolve as absolute paths.

  • <include=relative>: resolve as paths relative to the current config file.

value

  • str: a URL to a dictionary

  • list: a list of URLs

  • To include within the same file, use . as path.

  • The rules in URL and IO apply.

<literal>#

The keys marked as <literal> will not trigger the following rules:

<discard>#

The keys marked as <discard> will not be added into the current dictionary but will still be parsed. This is useful when only the side effects of the parsing are needed. e.g. define variables, execute code, etc.

<comment>#

This tag is reserved to never trigger any parser. This is useful when you want to leave a comment or add keys with duplicate names.

<file>#

This tag allows to insert any deserialized object from a URL. Unlike <include>, this tag will only replace the value by a deep copy of the loaded object, instead of merging it into the current dictionary. If the object is large and only used once, it is recommended to turn off the cache to avoid the deep copy.

tag

  • <file>: the type of the path will be inferred.

  • <file=absolute>: resolve as an absolute path.

  • <file=relative>: resolve as an path relative to the current config file.

  • <file=nocache>: turn off the cache.

  • <file=nobuffer>: turn off the buffer.

  • Use | to separate multiple flags: <file=relative|nocache|nobuffer>

value

  • str: a URL to any object

  • The rules in URL and IO apply.

<type>#

This tag can be used to import a module/attribute, create an instance of a class, or call a function.

tag

  • An import path is defined as {module}::{attribute}, which is roughly equivalent to the python statement from {module} import {attribute}.

    • {module}:: can be omitted for Built-in Functions.

    • If {attribute} is not provided or only contains dots, the whole module will be returned.

    • {attribute} can be a dot separated string to get a similar effect as <attr>.

  • <type>: when the tag value is not provided, the value must be a valid import path ande will be replaced by the imported object.

  • <type={module::attribute}>: when the tag value is provided, the imported object will be called with the value as its arguments.

value

  • <type>:

    • str: a valid import path {module}::{attribute}.

  • <type={module::attribute}>:

    • module.attribute(): if the value is None, no arguments will be passed.

    • module.attribute(*value): if the value is a list, it will be used as positional arguments.

    • module.attribute(**value): If the value is a dict and only contains string keys, the string keys will be used as keyword arguments.

    • module.attribute(*value[None], **value[others]): If the value is a dict and the None key is a list, the None key will be used as positional arguments.

    • module.attribute(value[None], **value[others]): If the value is a dict and the None key is not a list, the None key will be used as the first argument.

    • module.attribute(value): If the value is neither a list nor a dict, it will be used as the first argument.

<attr>#

This tag will replace the value by the its attribute. A tag like <attr=attr1.attr2> is equivalent to the pseudo code value.attr1.attr2.

tag

  • <attr={attribute}>: where the attribute can be a dot separated string.

<extend>#

This tag will try to extend the existing key by the new value in a way given by the pseudo code:

if key in local:
  return extend_method(local[key], value)
else:
  return value

where the extend_method() is a binary operation specified by the tag value.

tag

  • <extend>, <extend=add>: recursively merge dictionaries or apply + to other types.

  • <extend=and>: apply & operation.

  • <extend=or>: apply | operation.

  • <extend={operation}>: see <extend> operation

Warning

The built-in extend methods will not modify the original value in-place.

<var>#

This tag can be used to create a variable from the value. The variable lifecycle spans the entire parser __call__() and is shared by all files within the same call. The variable can be accessed using <ref> and is also available as locals in <code>.

tag

  • <var>: use the key as variable name.

  • <var={variable}>: use the tag value as variable name.

<ref>#

This tag can be used to access the variables defined with <var>.

tag

  • If the value is a string, it will be used as the variable name. Otherwise, the key will be used.

  • <ref>: replace the value by a reference to the variable.

  • <ref=copy>: replace the value by a copy() of the variable.

  • <ref=deepcopy>: replace the value by a deepcopy() of the variable.

<map>#

This tag converts a list of key-value pairs into a dictionary, which makes it possible to apply the tags that only work with values to the keys.

value

  • list: a list of dictionaries with keys key and val.

<select>#

This tag implements a conditional statement to select keys from a list of cases and replace itself by the selected keys. Each case is a dictionary where the keys with <case> (case-keys) will be interpreted as booleans and only contribute to the decision, while others (non-case-keys) will be merged into the current dictionary if the final decision is True.

Unlike other tags, only the necessary branches under <select> will be parsed. When <select=all>, the non-case-keys that failed the selection will not be parsed. When <select=first>, besides the failed non-case-keys, everything after the first selected case will not be parsed.

tag

  • <select>, <select=first>: only keep the first selected case.

  • <select=all>: keep all selected cases.

value

  • list: a list of dictionaries with <case> keys.

<case>#

This tag can only be used inside <select> to modify the decision. Each case will start with a False decision and the keys with <case> will update the decision based on the value and the operation specified by the tag value.

tag

  • <case>: decision = value

  • <case=or>: decision |= value

  • <case=and>: decision &= value

  • <case=xor>: decision ^= value

Support#

An VS Code extension is provided for syntax highlight. The extension is enabled for the following files:

  • YAML: *.cfg.yaml, *.cfg.yml

  • JSON: *.cfg.json

To install the extension, download the heptools-config-support-X.X.X.vsix from one of the releases.

Syntax Highlight#

The tokenization is implemented using TextMate grammars, which covers most of the tag rules with the following exceptions:

  • no flag conflicts check

<file=absolute|relative>: value # this will be highlighted but fail the parsing
  • no multiline key validation

? key
  <tag> # this will be highlighted but not parsed
  key
: value

Customization#

Tag parser#

A tag parser is a function that returns a key-value pair. The signature is given by the protocol TagParser where the arguments are keyword only and can be omitted if unnecessary. Custom parsers can be registered through the tag_parsers argument of ConfigParser. Built-in tags cannot be overridden.

<extend> operation#

Custom extend_method() for <extend> can be registered through the extend_methods argument of ConfigParser. The built-in extend methods cannot be overridden.

File deserializer#

A deserializer is a function that takes a read-only BytesIO stream as input and returns a deserialized object. Custom deserializers can be registered using the decorator register() of ConfigParser.io.

Advanced#

The following tags are not recommended for general usage and may lead to unexpected results or significantly increase the maintenance complexity.

<patch>#

Patch layers can be attached on top of config files to modify the raw content before <include>. A patch layer consists of a list of patches, each of which is a dictionary with the following structure:

path: "[scheme://netloc/]path[#fragment]" # the file to patch
actions: # actions to apply
  - action: name # the name of the action
    ... # other keyword arguments provided to the action

where the path can be either absolute or relative and the action is one of the following:

Action

Type

Arguments

mkdir: create a nested dict.

dict

  • target: a dot-separated path to a dict.

update: update the target dict by the value.

dict

  • target: a dot-separated path to a dict.

  • value: a dict.

pop: remove the target key/item from the dict/list.

dict list

  • target: a dot-separated path to a key/item.

set: set the target key/item to the value.

dict list

  • target: a dot-separated path to a key/item.

  • value: any object.

insert: insert the value before the target item.

list

  • target: a dot-separated path to an item.

  • value: any object.

append: append the value to the end of the target list.

list

  • target: a dot-separated path to a list.

  • value: any object.

extend: extend the target list by the value.

list

  • target: a dot-separated path to a list.

  • value: a list.

or a custom one registered through the patch_actions argument of ConfigParser. The built-in actions cannot be overridden.

This tag can be used to register a new patch layer. The layer will be installed right after the registration and in effect across all the configs within the same parser __call__(). If a key is provided, it will be used as the patch name. A named patch can be installed or uninstalled multiple times. The patches are evaluated lazily after the deserialization but before the tag parsing, so it is supposed to work as a preprocessor with minimal semantic support other than a regular tag.

tag

Register and install a patch layer:

  • <patch>: The type of the paths will be inferred.

  • <patch=absolute>: Resolve as absolute paths.

  • <patch=relative>: Resolve as relative paths.

Modify the patch layers:

  • <patch=install>: install patch layers.

  • <patch=uninstall>: uninstall patch layers.

value

  • <patch>, <patch=absolute>, <patch=relative>: a patch or a list of patches.

  • <patch=install>, <patch=uninstall>: a patch name or a list of patch names.