Skip to content

YARA Compatibility details

By default, Boreal will have some differences with the YARA API on purpose: the YARA API can have non ideal default behavior or have edge cases that are not properly handled and can cause issues. However, the exact behavior of the YARA module is accessible be setting the yara compatibility mode:

import boreal

boreal.set_config(yara_compatibility=True)

If you are not migrating from the YARA API, you should just use the default behavior of boreal and not enable this mode.

The list of the differences is documented below. If any of those is problematic for you, you can either align with the boreal behavior, or enable the yara compatibility mode.

Fast mode by default

Fast mode is enabled by default instead of being opt-in. This enables several optimizations, but means that results may not contain all possible match details.

import boreal

rules = boreal.compile(source="...")
# No need to specify fast=True. `matches` will list all matching rules,
# but details on the string matches might be missing if the scan did not
# need to compute them to complete faster.
matches = rules.match(data="...")

Strict escape on by default

The strict_escape parameter in the compile function defaults to True instead of False. This means that rules that contain invalid escaping in regexes produce warnings by default.

import boreal

rules = boreal.compile(source="""
rule foo {
    strings:
        $ = /C:\Users/
    condition:
        any of them
}
""")
# A warning is emitted by default, as opposed to yara
assert len(rules.warnings) == 1

Better hash implementations

The __hash__ implementation for the StringMatches and StringMatchInstance objects is improved to avoid collision issues. Those objects are returned in a scan result:

import boreal

rules = boreal.compile(source="...")
results = rules.match(data="...")
rule_match = results[0]
string_matches = rule_match.strings[0] # This is StringMatches
string_instances = string_matches.instances[0] # This is StringMatchInstance

String identifiers are not prefixed by '$'

The identifier field in StringMatches is not prefixed by $.

import boreal

rules = boreal.compile(source="""
rule foo {
    strings:
        $mystr = "abc"
    condition:
        any of them
results = rules.match(data="abc")
rule_match = results[0]
string_matches = rule_match.strings[0]
assert string_matches.identifier == "mystr"  # as opposed to "$mystr" in yara

Textual metadata values are bytes

Text metadata values are returned as byte-strings instead of strings.

import boreal
    rules = module.compile(source="""

rule a {
    meta:
        foo = "a normal string"
        bar = "a string with non ascii bytes: \\xCA\xFE"
    condition: true
}""")
matches = rules.match(data='')
assert matches[0].meta == {
    'foo': b'a normal string'
    'bar': b'a string with non ascii bytes: \xCA\xFE' # yara does not return this properly
}

Match callback only receive matching rules by default

If which_callbacks is not specified in the match method, the default value is CALLBACK_MATCHES and not CALLBACK_ALL. This default value is almost always what is expected, and enabling the collection of non matching rules disables fast mode, making it undesirable.

import boreal

def cb(rule):
    # This callback only receives matching rules
    pass

rules = boreal.compile(source="...")
matches = rules.match(data="...", callback=cb)

Maximum number of matches is reduced

The maximum number of matches for a single string is much reduced compared to the 1 000 000 set in yara. This avoids performance regressions on strings matching too often. The default value is 1000, but this can easily be modified:

import boreal

rules = boreal.compile(source="...")
rules.set_params(string_max_nb_matches=100000)

Dictionaries returned by modules uses byte-strings as keys

Dictionaries that are contained in module values returned in the modules_callback uses bytestrings as keys instead of strings. This can happen for example in the pe.version_info dictionary:

rules = boreal.compile(source="""
import "pe"
rule a { condition: true }
""")

def modules_callback(values):
    assert values['version_info'][b'InternalName'] == b'MTXEX.DLL'
    # This is ['version_info']['InternalName'] in yara

rules.match('mtxex.dll', modules_callback=modules_callback)

This is required because those keys are not guaranteed to be strings. There is a known bug in yara due to this: if a dictionary key is not a proper string, the scan will fail.