Event Aggregator ================ Overview -------- The Event Aggregator service (the systemd unit is named ``dstore-ev-aggregator``) is designed to filter PDNS Protobuf messages, specifically messages about DNS filtering events, as well as new device messages, and do the following: - Decide whether to send the message on to any of the configured output channels using filtering and throttling logic. By default, events are not sent; they are only sent to a given output channel if they match a filter and are not throttled. - Send the event to the matching output channel (if any). The same event may be sent to multiple output channels if multiple matching filters are defined. The event aggregator also performs aggregation of messages for sending on to a downstream data-warehousing services such as Elasticsearch. This is achieved using the optional aggregations logic, which keeps track of the number of messages received, and only sends a proportion of those messages to the downstream DB, recording as part of the event data the number of events that were supressed. If either an elasticsearch or logstash URL is configured, then all received events will be sent to that URL. Thus, if you only want to use event aggregator to send events to elasticsearch, then the simplest way to achieve this is to configure an elasticsearch or logstash URL, together with a single aggregation that doesn't match any events. Configuration ------------- Configuration is via two files: - Global Configuration: The global configuration file is called ``ev_aggregator.conf`` and contains configuration settings, including where to find the filter definition file. - Filter Configuration: The filter/aggregation configuration file is defined in ``ev_aggregator.conf`` using the ``filter-file`` setting and contains definitions of the output channels and filters for those output channels. By default the above configuration files are located in /etc/dstore; this can be changed using command-line configuration options as described below. Global Configuration ^^^^^^^^^^^^^^^^^^^^ The following settings are available: :config-file: Specify a different config file location :daemon: Run as a daemon :listen-address: Address and port to listen on for PBDNSMessage events :alert-listen-address: Address and port to listen on for PBAlertMessage events :logstash-url: URL, e.g. https://192.168.1.254:8080/ of a logstash server to send all received events to (in json format). Optional parameter, however it is required for many use-cases, so disable with care. This is now deprecated in favour of sending to elasticsearch directly using the elasticsearch-url parameter. :logstash-userpass: Username/Password for logstash in the form username:password :elasticsearch-url: URL (with no path component) of an elasticsearch server to send all received events to. Supercedes the logstash-url parameter. :elasticsearch-index-prefix: The index prefix to use with elasticsearch. Index names will be created as "%{YYYY}%{MM}%{dd}". :elasticsearch-index-template: Whether to upload an index template to elasticsearch that maps the timestamp parameter to a date type. This defaults to false. :elasticsearch-userpass: Username/Password for elasticsearch in the form username:password :worker-threads: The number of worker threads to create (these process individual protobuf messages and are used to search Redis). Defaults to 20. :webhook-threads: The number of threads to create for sending webhooks (used for the webhook, notification_center output channels, as well as for writing to Logstash). Defaults to 10. :webhook-conns: The maximum number of HTTP connections that each webhook thread will use. Defaults to 10. :fail-open: If fail-open is set to false (the default), then if Redis is unavailable (and thus throttling cannot be determined), then events that match input filters will not be sent to output channels. If set to true, then matching events will always be sent to output channels. :malware-tags: A comma separated list of tags that indicate malware filtering. These are used to indicate that an event is related to malware. :botnet-tags: A comma separated list of tags that indicate botnet filtering. These are used to indicate that an event is related to botnets. :phishing-tags: A comma separated list of tags that indicate phishing filtering. These are used to indicate that an event is related to phishing. :blacklist-tags: A comma separated list of tags that indicate filtering due to blacklists. These are used to indicate that an event is related to blacklisting. :contentfilter-tags: A comma separated list of tags that indicate filtering due to content filtering. These are used to indicate that an event is related to blacklisting. :platform-url: The URL of the PowerDNS Platform API. If specified, then the list of category names and titles is downloaded every hour and used to provide "friendly" names to the Notification Center. :platform-auth-token: The token to use in the X-API-Key header for authorization to the Platform API. :filter-file: The location of the file (in YAML format) used to configure output channels and filters. Mandatory parameter. :redis-server: The hostname/IP address of a redis server, which will be used for throttling/filtering queries. Mandatory parameter. :redis-port: The port number of a redis server. :redis-hash-keys: Hash redis keys to save memory and CPU (defaults to true). :redis-password: The password to use for Redis (optional) :redis-retries: The number of retry attempts to connect to Redis after connection failure (defaults to 3) :http-listen-address: The address (and port) to use to provide prometheus metrics via HTTP on the /metrics endpoint. Format is :. The port defaults to 8083. Filter Configuration ^^^^^^^^^^^^^^^^^^^^ Output Channels and Filters are defined in the file specified by the ``filter-file`` setting. The filter file is a YAML-format file, e.g.: :: output_channels: - output_channel: name: Output Channel 1 type: notification_center url: http://127.0.0.1:8080/ api-key: secret - output_channel: name: Output Channel 2 type: webhook url: http://127.0.0.1:8081/ api-key: secret basic-auth: user:password secret: secret aggregations: - aggregate: name: test_aggregation description: Always send the first 10 events, then aggregate more aggressively as the number of events increases using a 10x multiplier. input_filter: qname: aggregate.com min_events: 10 multiplier: 10 cache_timeout: 600 max_aggregate: 10000 output_channel: webhook switch: on filters: - filter: name: Filter 1 description: This is a filter input_filter: app: pdns type: dnsfilter user_id: "?" input_exceptions: filtertype: phishing switch: on throttle: min_events: 0 max_notifications: 1 period: 86400 output_channel: Output Channel 1 - filter: name: Filter 2 description: This is another filter input_filter: app: pdns type: newdevice user_id: "?" switch: on throttle: min_events: 10 period: 3600 output_channel: Output Channel 2 Output Channel Configuration ```````````````````````````` All output channels must have ``name`` and ``type`` fields. The type must currently be one of the following strings: - ``log`` - ``webhook`` - ``notification_center`` For output channels of type ``webhook`` and ``notification_center``, the following additional fields are mandatory: - ``url``: The URL of the webhook endpoint. Note that the following tokens in a URL will be expanded: - %{YYYY} - Expands to the current year e.g. 2020 - %{MM} - Expands to the current month number, e.g. 01 or 12 - %{dd} - Expands to the current day of the month number, e.g. 01 or 25 And the following fields are optional: - ``api-key``: The value to place into an ``X-API-Key`` header - ``basic-auth``: The username and password to provide for basic authentication (in user:password format) - ``secret``: The secret to use when generating a ``X-Signature`` header Output channels must be defined in the file *before* the filters map. Filter Configuration ```````````````````` Filters are used to send matching events to output channels. Every filter must have an input filter, which matches the events to be sent, and an output channel, which decides where the event is sent. Optionally filters also have a throttle, which can be used to restrict when events are sent to the output channel. Filters without a throttle will send every matching event to the output channel. Finally filters have a switch, which simply enables or disables the filter. Input filters are mandatory and consist of a list of field names and values. There must be at least one field specified; empty input filters are not valid. There are two types of match for input filter fields: - Exact Match: For example, ``key: value`` will match if the event has a field called "key" with a value of "value". - Current Match: For example ``key: "?"``. The "?" syntax specifies that the value of the specified field in the current event is used, whatever that is. For example if the current value is "foo" then "?" will be substituted with "foo". To match an input filter, all fields must match, i.e. the terms are combined with a logical AND. Input filters only match string fields; you cannot match on an array field currently. The current match syntax of "?" may be considered similar to a wildcard, which indeed it is for matching purposes, however its use is more suble than that. Only the events that match the input filter are counted for throttling purposes, which means that for example specifying ``user_id: "?"`` as an input filter would count only events for the matching user, thus enabling per-user throttling to be implemented for example. If no throttle is specified then the "?" can be considered to be identical to a wildcard. An optional input_exceptions map consisting of field names and values can be configured. Any event which has fields which match any of the input exceptions will not be matched (i.e. logical OR) and the filter will be skipped for that event. Only exact matches can be configured, and there is no support for the "?" syntax in input_exceptions. The ``switch`` field should be set to "off" to disable a filter. If the ``switch`` field is missing, the filter is considered disabled. The ``output_channel`` field must specify an output channel name as defined in the previous section. Throttles are used to filter matching events. They are optional, meaning that if a throttle is not specified, all matching events will be sent to the specified output channel. The only mandatory field of a throttle is ``period``: - ``period``: The number of seconds over which the throttle applies. Used to scope the query to Redis. Throttles must specify either one or both of the following: - ``min_events``: The minimum number of matching events which must have been sent previously before the current event is sent to the specified output channel. Once this threshold is exceeded for the current time period, events will continue to be sent unless throttled with ``max_notifications``. - ``max_notifications``: The maximum number of matching notifications (i.e. events that are actually sent to the specified output channel) that will be sent in the current time period. Note that setting a value of 0 will ensure that events are always throttled. Both ``min_events`` and ``max_notifications`` can be specified at the same time. If we consider the following filter: :: - filter: name: filter1 description: foo input_filter: app: pdns user_id: "?" qname: "?" type: dnsfilter input_exceptions: filtertype: botnet switch: on throttle: period: 3600 max_notifications: 1 This can be explained as follows: *The filter 'filter1' matches events matching the "pdns" app, and the "dnsfilter" type, but not events containing the 'botnet' filtertype, sending no more than one notification per hour for each unique combination of user_id and qname.* For example, the following table shows whether a notification is sent for a set of incoming events (in chronological order, all received with a minute): +-----------------------+--------------------+ | Event | Notification sent? | +=======================+====================+ | app: pdns | | | type: dnsfilter | y | | user_id: joe | | | qname: facebook.com | | +-----------------------+--------------------+ | app: pdns | | | type: dnsfilter | y | | user_id: joe | | | qname: powerdns.com | | +-----------------------+--------------------+ | app: pdns | | | type: dnsfilter | y | | user_id: mary | | | qname: facebook.com | | +-----------------------+--------------------+ | app: pdns | | | type: dnsfilter | n | | user_id: joe | | | qname: facebook.com | | +-----------------------+--------------------+ | app: pdns | | | type: dnsfilter | n | | user_id: mary | (Event will not | | qname: google.com | match filter) | | filtertype: botnet | | +-----------------------+--------------------+ | app: pdns | | | type: dnsfilter | y | | user_id: mary | | | qname: google.com | | +-----------------------+--------------------+ | app: pdns | | | type: dnsfilter | n | | user_id: mary | | | qname: google.com | | +-----------------------+--------------------+ Aggregation Configuration ````````````````````````` Aggregations are also used to send matching events to output channels. Similarly to filters, aggregations must have an input filter, which matches the events to be sent, and an output channel, which decides where the event is sent. However rather than a throttle, the aggregation defines a multiplier (a positive integer), which determines how aggressively the events will be aggregated; a higher number is more aggressive, and a multiplier of 1 means that no aggregation will be done (i.e. every matching event will be sent). Finally aggregations also have a switch, which simply enables or disables the filter. Input filters are mandatory and work exactly as described above for filters. The ``name`` field is used to uniquely identify the aggregation, and the ``description`` field helps identify its purpose. The ``switch`` field should be set to "off" to disable an aggregation. If the ``switch`` field is missing, the aggregation is considered disabled. The ``output_channel`` field must specify an output channel name as defined in the previous section. Aggregations must specify a multiplier: - ``multiplier``: The multiplication factor used to determine how aggressively events get aggregated. A multiplication factor of 10 means that the number of events that get aggregated will increase by a factor of 10 as the number of events increases (e.g. between 1-100, one in every 10 events will be sent, between 100-100 one in every 100 events will be sent etc.) The following are optional: - ``min_events``: If the event count is <= min_events, every event will be sent. Defaults to 0. - ``max_aggregate``: The maximum number of events that will be aggregated. Use this to limit the multiplication factor, e.g. 100 will limit to one event being sent for every 1000 events received. Defaults to 0 meaning infinity, meaning no limit. - ``cache_timeout``: The count of events is stored in redis, with an expiry. If no events are received within this window then the count is expired, i.e. reset to 0. It defaults to 600 seconds. If we consider the following aggregation: :: - aggregate: name: aggregation1 description: foo input_filter: app: pdns switch: on min_events: 10 multiplier: 10 max_aggregate: 1000 cache_timeout: 600 output_channel: webhook The following table shows which events will cause an aggregated message to be sent to the output channel (assuming all events are received within the cache_timeout window): +--------------+-------------+-----------------------------+ | Event Number | Event sent? | number of events aggregated | +==============+=============+=============================+ | 1 | y | 1 | +--------------+-------------+-----------------------------+ | 10 | y | 1 | +--------------+-------------+-----------------------------+ | 19 | n | n/a | +--------------+-------------+-----------------------------+ | 20 | y | 10 | +--------------+-------------+-----------------------------+ | 110 | y | 10 | +--------------+-------------+-----------------------------+ | 210 | y | 100 | +--------------+-------------+-----------------------------+ | 220 | n | n/a | +--------------+-------------+-----------------------------+ | 1010 | y | 100 | +--------------+-------------+-----------------------------+ | 1110 | n | n/a | +--------------+-------------+-----------------------------+ | 2010 | y | 1000 | +--------------+-------------+-----------------------------+ | 3010 | y | 1000 | +--------------+-------------+-----------------------------+ The JSON sent to the output channel will contain an event_count field that contains the number of events aggregated. Input Field Dictionaries ------------------------ Currently the event aggregator understands two types of event: - DNSMessage Events - These events can have either: ``type: dnsfilter`` for queries that are filtered, or ``type: dnsquery`` for queries that are not filtered. - AlertMessage Events - Currently only NewDeviceMessage subtypes are processed. These events will have ``type: newdevice``. The following sections list the dictionaries of the possible fields and their values for these events. ``dnsfilter`` Dictionary ^^^^^^^^^^^^^^^^^^^^^^^^ The following table lists the possible fields present in an event of type ``dnsfilter``. ``dnsquery`` events contain a subset of these fields (e.g. there will never be a 'filter_type' field). +----------------------+------------+-----------------------+-----------------------------+ | Field Name | Type | Possible Values | Description | +======================+============+=================+===================================+ | type | String | dnsfilter | For filtered queries | | | | dnsquery | For unfiltered queries | +----------------------+------------+-----------------+-----------------------------------+ | app | String | pdns | The name of the app, always pdns | +----------------------+------------+-----------------+-----------------------------------+ | user_id | String | Any | The user id of the user | +----------------------+------------+-----------------+-----------------------------------+ | qname | String | Any DNS domain | The filtered domain name | +----------------------+------------+-----------------+-----------------------------------+ | device_id | String | Any | The ID of the device that was | | | | | filtered | +----------------------+------------+-----------------+-----------------------------------+ | device_ip | String | Any v4 or v6 IP | The IP address of the filtered | | | | Address | DNS query | +----------------------+------------+-----------------+-----------------------------------+ | filter_type | String | malware | The type of filtering, only for | | | | phishing | filtered queries | | | | botnet | | | | | blacklist | | | | | contentfilter | | +----------------------+------------+-----------------+-----------------------------------+ | rule | String | oxp-security- | There should only ever be one | | | | malware | rule that matches an event | +----------------------+------------+-----------------+-----------------------------------+ | tags | Array | Any string | Each element in the array is a | | | | | separate tag. For example: | | | | | cat:OX-category-porn, foo, bar, | | | | | rule:oxp-content-blacklist | +----------------------+------------+-----------------+-----------------------------------+ | categories | Array | Any string | Each element in the array | | | | | is a separate category. | | | | | For example: OXP-category-porn, | | | | | OX-category-malware | +----------------------+------------+-----------------+-----------------------------------+ | timestamp | Integer | Any integer | Represents milliseconds since | | | | | UNIX Epoch, e.g. 1549469500048 | +----------------------+------------+-----------------+-----------------------------------+ ``newdevice`` Dictionary ^^^^^^^^^^^^^^^^^^^^^^^^ The following table lists the possible fields present in an event of type ``newdevice``. +----------------------+------------+-----------------+-----------------------------------+ | Field Name | Type | Possible Values | Description | +======================+============+=================+===================================+ | type | String | newdevice | | +----------------------+------------+-----------------+-----------------------------------+ | app | String | pdns | The name of the app, always pdns | +----------------------+------------+-----------------+-----------------------------------+ | user_id | String | Any | The user id of the user | +----------------------+------------+-----------------+-----------------------------------+ | device_id | String | Any | The ID of the device that was | | | | | detected | +----------------------+------------+-----------------+-----------------------------------+ | device_name | String | Any | The name of the device that was | | | | | detected | +----------------------+------------+-----------------+-----------------------------------+ | device_type | String | Any | The type of the new device | +----------------------+------------+-----------------+-----------------------------------+ | timestamp | Integer | Any integer | Represents milliseconds since | | | | | UNIX Epoch, e.g. 1549469500048 | +----------------------+------------+-----------------+-----------------------------------+ Logstash Schema ---------------- The data in the dictionaries described above is sent to ELK (and most output filters) in JSON format. Here is an example DNS Message: .. code-block:: json { "app": "pdns", "categories": [ "porn", "gambling", "OXP-platform-facebook" ], "device_id": "4005ffeeeeddaadddd", "device_ip": "2001:db8::1", "filter_type": "filter", "gambling_tag": "1", "porn_tag": "1", "OXP-platform-facebook_tag": "1", "qname": "min_filter.com", "timestamp": 1549465752000, "type": "dnsfilter", "user_id": "ncook" } Here is an example New Device Message: .. code-block:: json { "app": "pdns", "device_id": "4005ffeeeeddaadddd", "device_name": "Neil's iPhone", "device_type": "Apple iPhone X", "timestamp": 1549470440410, "type": "dnsfilter", "user_id": "ncook" } Here are two example aggregated DNS Messages: .. code-block:: json { "app": "pdns", "categories": [ "porn", "gambling", "OXP-platform-facebook" ], "device_id": "4005ffeeeeddaadddd", "device_ip": "2001:db8::1", "filter_type": "filter", "gambling_tag": "1", "porn_tag": "1", "OXP-platform-facebook_tag": "1", "qname": "min_filter.com", "timestamp": 1549465752000, "type": "dnsfilter", "user_id": "ncook", "event_count": 1000 } { "app": "pdns", "device_id": "4005ffee2938adddd", "device_ip": "2001:db8::1", "qname": "facebook.com", "timestamp": 1549465752000, "type": "dnsquery", "user_id": "luser", "event_count": 100 } Note that an aggregated event is identical to a normal event except that it contains an event_count field.