Config Parser#
This tool provides a flexible framework to load and merge configurations from different sources for Python projects. A tag style syntax is introduced to control the loading or merging behavior.
Quick Start#
Write configs#
A config is any file that can be deserialized into a python dictionary. Common formats including .yaml
(.yml
), .json
and .toml
are supported out of the box. Other formats may require a File deserializer.
Tag syntax extends the existing serialization languages to support complicated control flows or python specific features, e.g. include directive, variable definition, python object initialization, etc. An example is available in advanced-yaml-config.
Load configs#
An instance of ConfigParser
is used to load config files and parse the tags. The default setup is sufficient for most use cases, while Customization is possible through the arguments of __init__()
. Each __call__()
will create a new context to maintain local variables and return the parsed configs in a single dictionary.
The parsing is performed in two passes:
Deserialize the files into dictionaries.
Apply the tags.
The order of the paths provided to __call__()
and the order of keys and items from the first pass are preserved to the final output. The tags are parsed recursively from innermost to outermost.
example
from heptools.config import ConfigParser
parser = ConfigParser()
configs = parser("config1.yml", "config2.yml", ...)
Tag#
Syntax#
A tag is defined as a key-value pair given by
<tag_key=tag_value>
or<tag_key>
if the tag value isNone
. Newlines are not allowed within a tag.Arbitrary number of tags can be attached to a key.
The spaces and newlines between the key and tags are optional.
example
The following are examples of valid tags:
key: value
key<tag_key>: value
key <tag_key=tag_value>: value
key <tag_key1=tag_value1><tag_key2> <tag_key3=tag_value3> : value
<tag_key1> <tag_key2=tag_value2> : value
? key
<tag_key1> <tag_key2=tag_value2>
<tag_key3=tag_value3>
: value
Precedence#
The tags are parsed from left to right based on the order of appearance.
The same tag can be applied multiple times.
The parsing result is order dependent.
Some of the built-in tags follow special rules:
<code> have the highest precedence and will only be parsed once.
The following tags will not trigger any parser.
The order of the following tags are ill-defined, as they are not supposed to simply modify the key-value pairs. As a result, they cannot be directly chained with other regular tags, unless through <code>.
URL and IO#
Both the ConfigParser
and built-in tags <include>, <file> shares the same IO mechanism.
The file path is described by a standard URL accepted by urlparse()
with the format:
[scheme://netloc/]path[;parameters][?query][#fragment]
scheme://netloc/
can be omitted for local path.;parameters
is never used.?query
can be used to provide additional key-value pairs. If a key appears multiple times, all values will be collected into a list. Values are interpreted as JSON strings.#fragment
is a dot-separated path, allowing to access nested dictionaries or lists. Similar toTOML
’s table, double quotes can be used to escape the dot.The percentage-encoding rule (
%XX
) is supported in thepath
to escape special characters.
Warning
The #fragment
is extracted before any parsing.
example
The following URLs are all valid:
local path: /path/to/file.yml
XRootD path: root://server.host//path/to/file.yml
fragment: /path/to/file.yml#key1.key2 <extend>.0."key3.key4"
query: /path/to/file.yml?key1=value1&key2=value2&key1=value3&key3=[1,2,3]&parent.child=value4
The fragment
example above is equivalent to the pseudo code:
yaml.load(open("/path/to/file.yml"))["key1"]["key2 <extend>"][int("0")]["key3.key4"]
The query
example above will give an additional dictionary
{
"key1": ["value1", "value3"],
"key2": "value2",
"key3": [1, 2, 3],
"parent": {"child": "value4"},
}
File IO is handled by fsspec.open()
and the deserialization is handled by ConfigParser.io
, an instance of FileLoader
.
The compression format is inferred from the last extension, see
fsspec.utils.compressions
.The deserializer is inferred from the longest registered extension that does not match any compression format.
The deserialized objects will be catched, and can be cleared by
ConfigParser.io.clear_cache
.
Special#
nested=True
in ConfigParser
#
The nested=True
(default) option enables a behavior similar to TOML
’s table, where the dot-separated keys will be interpreted as accessing a nested dictionary and the parents will not be overriden. Use double quotes or <literal> to escape the keys with dot.
example
parent1:
child1: value1
parent1 <comment>: # override the parent
child2: value2
parent1.child3: value3 # modify the child without overriding the parent
parent2.child.grandchild: value4 # create a nested dict
will be parsed into
{
"parent1": {
"child2": "value2",
"child3": "value3",
},
"parent2": {"child": {"grandchild": "value4"}},
}
None
key#
Besides the standard rules, both ~
and empty string in the key will be parsed into None
.
example
# None
~: value
~ <tag>: value
"": value
<tag>: value
null: value
# not None
null <tag>: value
Apply to list
elements#
When the element is a dictionary and the only key is None
, the element will be replaced by its value. Use <literal> to retain the original dictionary.
example
- key1: value1
<tag>: value2 # regular None key
- <tag>: value3 # replace the whole element with its value
- <tag> <literal>: value4 # escape the None key
will be parsed into
[
{"key1": "value1", None: "value2"},
"value3",
{None: "value4"},
]
Support#
An VS Code extension is provided for syntax highlight. The extension is enabled for the following files:
YAML
:*.cfg.yaml
,*.cfg.yml
JSON
:*.cfg.json
To install the extension, download the heptools-config-support-X.X.X.vsix
from one of the releases.
Syntax Highlight#
The tokenization is implemented using TextMate grammars, which covers most of the tag rules with the following exceptions:
no flag conflicts check
<file=absolute|relative>: value # this will be highlighted but fail the parsing
no multiline key validation
? key
<tag> # this will be highlighted but not parsed
key
: value
Customization#
Tag parser#
A tag parser is a function that returns a key-value pair. The signature is given by the protocol TagParser
where the arguments are keyword only and can be omitted if unnecessary. Custom parsers can be registered through the tag_parsers
argument of ConfigParser
. Built-in tags cannot be overridden.
example
The following example defines two custom tags: one repeats the value by a given number of times and the other controls how the copy is made.
import copy
def repeat_parser(tags: dict[str], tag: str, key: str, value):
tag = int(tag or 1)
if mode := tags.get("repeat.mode"):
match mode:
case "copy":
method = copy.copy
case "deepcopy":
method = copy.deepcopy
case _:
raise ValueError(f"unknown repeat mode {mode}")
return key, [value] + [method(value) for _ in range(tag - 1)]
return key, [value] * tag
parser = ConfigParser(tag_parsers={"repeat": repeat_parser, "repeat.mode": None})
Then, the following config
key1 <var=value1><repeat=3>: []
key2 <var=value2><repeat.mode=deepcopy><repeat=3>: []
<discard>:
<code> <comment=key1>: value1.append(1)
<code> <comment=key2>: value2.append(1)
will be parsed into
{
"key1": [[1], [1], [1]],
"key2": [[1], [], []]
}
<extend>
operation#
Custom extend_method()
for <extend> can be registered through the extend_methods
argument of ConfigParser
. The built-in extend methods cannot be overridden.
example
The following example defines a custom operation to concat paths.
from pathlib import PurePosixPath
def extend_paths(old_value: str, new_value: str):
return PurePosixPath(old_value) / new_value
parser = ConfigParser(extend_methods={"path": extend_paths})
Then, the following config
key: base
key <extend=path>: file
will be parsed into
{
"key": PurePosixPath("base") / "file"
}
File deserializer#
A deserializer is a function that takes a read-only BytesIO
stream as input and returns a deserialized object. Custom deserializers can be registered using the decorator register()
of ConfigParser.io
.
example
The following example implements a deserializer to load CSV
files.
@ConfigParser.io.register("csv")
def csv_loader(stream: BytesIO):
headers = stream.readline().decode().strip().split(",")
lineno = 1
data = [[] for _ in range(len(headers))]
while row := stream.readline():
lineno += 1
row = row.decode().strip()
if not row:
continue
row = row.split(",")
if len(row) != len(headers):
raise ValueError(f"line {lineno}: length mismatch.")
for i, value in enumerate(row):
data[i].append(value)
return dict(zip(headers, data))
Then, the .csv
files in <include> and <file> can be properly loaded.
Advanced#
The following tags are not recommended for general usage and may lead to unexpected results or significantly increase the maintenance complexity.
<patch>
#
Patch layers can be attached on top of config files to modify the raw content before <include>. A patch layer consists of a list of patches, each of which is a dictionary with the following structure:
path: "[scheme://netloc/]path[#fragment]" # the file to patch
actions: # actions to apply
- action: name # the name of the action
... # other keyword arguments provided to the action
where the path
can be either absolute or relative and the action
is one of the following:
Action |
Type |
Arguments |
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
or a custom one registered through the patch_actions
argument of ConfigParser
. The built-in actions cannot be overridden.
This tag can be used to register a new patch layer. The layer will be installed right after the registration and in effect across all the configs within the same parser __call__()
. If a key is provided, it will be used as the patch name. A named patch can be installed or uninstalled multiple times. The patches are evaluated lazily after the deserialization but before the tag parsing, so it is supposed to work as a preprocessor with minimal semantic support other than a regular tag.
tag
Register and install a patch layer:
<patch>
: The type of the paths will be inferred.<patch=absolute>
: Resolve as absolute paths.<patch=relative>
: Resolve as relative paths.
Modify the patch layers:
<patch=install>
: install patch layers.<patch=uninstall>
: uninstall patch layers.
value
<patch>
,<patch=absolute>
,<patch=relative>
: a patch or a list of patches.<patch=install>
,<patch=uninstall>
: a patch name or a list of patch names.
example
--- # file1.yml
key1 <type=os::path.join>:
- path
- to
- file
--- # file2.yml
key2 <type=datetime::datetime>:
year: 2025
month: 1
day: 1
--- # patched.yml
patch1 <patch>:
- path: file1.yml
actions:
- action: insert
target: '"key1 <type=os::path.join>".2'
value: new
- path: file2.yml
actions:
- action: update
target: "key2 <type=datetime::datetime>"
value:
month: 12
day: 31
patched:
<include>:
- file1.yml
- file2.yml
unpatched:
<patch=uninstall>: patch1
<include>:
- file1.yml
- file2.yml
The example above will be parsed into
{
"patched": {
"key1": os.path.join("path", "to", "new", "file"),
"key2": datetime.datetime(2025, 12, 31),
},
"unpatched": {
"key1": os.path.join("path", "to", "file"),
"key2": datetime.datetime(2025, 1, 1),
},
}