Metrics and Statistics

The PowerDNS Recursor collects many statistics about itself.

Regular Statistics Log

Every half hour or so (configurable with statistics-interval, the recursor outputs a line with statistics. To force the output of statistics, send the process a SIGUSR1. A line of statistics looks like this:

Feb 10 14:16:03 stats: 125784 questions, 13971 cache entries, 309 negative entries, 84% cache hits, outpacket/query ratio 37%, 12% throttled

This means that there are 13791 different names cached, which each may have multiple records attached to them. There are 309 items in the negative cache, items of which it is known that don’t exist and won’t do so for the near future. 84% of incoming questions could be answered without any additional queries going out to the net.

The outpacket/query ratio means that on average, 0.37 packets were needed to answer a question. Initially this ratio may be well over 100% as additional queries may be needed to actually recurse the DNS and figure out the addresses of nameservers.

Finally, 12% of queries were not performed because identical queries had gone out previously, saving load on servers worldwide.

Sending metrics to Graphite/Metronome over Carbon

For carbon/graphite/metronome, we use the following namespace. Everything starts with ‘pdns.’, which is then followed by the local hostname. Thirdly, we add ‘recursor’ to signify the daemon generating the metrics. This is then rounded off with the actual name of the metric. As an example: ‘pdns.ns1.recursor.questions’.

Care has been taken to make the sending of statistics as unobtrusive as possible, the daemons will not be hindered by an unreachable carbon server, timeouts or connection refused situations.

To benefit from our carbon/graphite support, either install Graphite, or use our own lightweight statistics daemon, Metronome, currently available on GitHub.

To enable sending metrics, set carbon-server, possibly carbon-interval and possibly carbon-ourname in the configuration.

Warning

If your hostname includes dots, they will be replaced by underscores so as not to confuse the namespace.

If you include dots in carbon-ourname, they will not be replaced by underscores. As PowerDNS assumes you know what you are doing if you override your hostname.

Sending metrics over SNMP

New in version 4.1.0.

The recursor can export statistics over SNMP and send traps from Lua, provided support is compiled into the Recursor and snmp-agent set.

Getting Metrics from the Recursor

Should Carbon not be the preferred way of receiving metric, several other techniques can be employed to retrieve metrics.

Using the Webserver

The API exposes a statistics endpoint at GET /api/v1/servers/:server_id/statistics. This endpoint exports all statistics in a single JSON document.

Using rec_control

Metrics can also be gathered on the system itself by invoking rec_control:

rec_control get-all

Single statistics can also be retrieved with the get command, e.g.:

rec_control get all-outqueries

External programs can use this technique to scrape metrics.

Gathered Information

These statistics are gathered.

It should be noted that answers0-1 + answers1-10 + answers10-100 + answers100-1000 + answers-slow + packetcache-hits + over-capacity-drops + policy-drops = questions.

Also note that unauthorized-tcp and unauthorized-udp packets do not end up in the ‘questions’ count.

all-outqueries

counts the number of outgoing UDP queries since starting

answers-slow

counts the number of queries answered after 1 second

answers0-1

counts the number of queries answered within 1 millisecond

answers1-10

counts the number of queries answered within 10 milliseconds

answers10-100

counts the number of queries answered within 100 milliseconds

answers100-1000

counts the number of queries answered within 1 second

auth4-answers-slow

counts the number of queries answered by auth4s after 1 second (4.0)

auth4-answers0-1

counts the number of queries answered by auth4s within 1 millisecond (4.0)

auth4-answers1-10

counts the number of queries answered by auth4s within 10 milliseconds (4.0)

auth4-answers10-100

counts the number of queries answered by auth4s within 100 milliseconds (4.0)

auth4-answers100-1000

counts the number of queries answered by auth4s within 1 second (4.0)

auth6-answers-slow

counts the number of queries answered by auth6s after 1 second (4.0)

auth6-answers0-1

counts the number of queries answered by auth6s within 1 millisecond (4.0)

auth6-answers1-10

counts the number of queries answered by auth6s within 10 milliseconds (4.0)

auth6-answers10-100

counts the number of queries answered by auth6s within 100 milliseconds (4.0)

auth6-answers100-1000

counts the number of queries answered by auth6s within 1 second (4.0)

cache-bytes

size of the cache in bytes

cache-entries

shows the number of entries in the cache

cache-hits

counts the number of cache hits since starting, this does not include hits that got answered from the packet-cache

cache-misses

counts the number of cache misses since starting

case-mismatches

counts the number of mismatches in character case since starting

chain-resends

number of queries chained to existing outstanding query

client-parse-errors

counts number of client packets that could not be parsed

concurrent-queries

shows the number of MThreads currently running

dlg-only-drops

number of records dropped because of delegation-only setting

dnssec-queries

number of queries received with the DO bit set

dnssec-result-bogus

number of DNSSEC validations that had the Bogus state

dnssec-result-indeterminate

number of DNSSEC validations that had the Indeterminate state

dnssec-result-insecure

number of DNSSEC validations that had the Insecure state

dnssec-result-nta

number of DNSSEC validations that had the NTA (negative trust anchor) state

dnssec-result-secure

number of DNSSEC validations that had the Secure state

dnssec-validations

number of DNSSEC validations performed

dont-outqueries

number of outgoing queries dropped because of dont-query setting (since 3.3)

ecs-queries

number of outgoing queries adorned with an EDNS Client Subnet option (since 4.1)

ecs-responses

number of responses received from authoritative servers with an EDNS Client Subnet option we used (since 4.1)

edns-ping-matches

number of servers that sent a valid EDNS PING response

edns-ping-mismatches

number of servers that sent an invalid EDNS PING response

failed-host-entries

number of servers that failed to resolve

ignored-packets

counts the number of non-query packets received on server sockets that should only get query packets

ipv6-outqueries

number of outgoing queries over IPv6

ipv6-questions

counts all end-user initiated queries with the RD bit set, received over IPv6 UDP

malloc-bytes

returns the number of bytes allocated by the process (broken, always returns 0)

max-cache-entries

currently configured maximum number of cache entries

max-packetcache-entries

currently configured maximum number of packet cache entries

max-mthread-stack

maximum amount of thread stack ever used

negcache-entries

shows the number of entries in the negative answer cache

no-packet-error

number of erroneous received packets

noedns-outqueries

number of queries sent out without EDNS

noerror-answers

counts the number of times it answered NOERROR since starting

noping-outqueries

number of queries sent out without ENDS PING

nsset-invalidations

number of times an nsset was dropped because it no longer worked

nsspeeds-entries

shows the number of entries in the NS speeds map

nxdomain-answers

counts the number of times it answered NXDOMAIN since starting

outgoing-timeouts

counts the number of timeouts on outgoing UDP queries since starting

outgoing4-timeouts

counts the number of timeouts on outgoing UDP IPv4 queries since starting (since 4.0)

outgoing6-timeouts

counts the number of timeouts on outgoing UDP IPv6 queries since starting (since 4.0)

over-capacity-drops

questions dropped because over maximum concurrent query limit (since 3.2)

packetcache-bytes

size of the packet cache in bytes (since 3.3.1)

packetcache-entries

size of packet cache (since 3.2)

packetcache-hits

packet cache hits (since 3.2)

packetcache-misses

packet cache misses (since 3.2)

policy-drops

packets dropped because of (Lua) policy decision

policy-result-noaction

packets that were not actioned upon by the RPZ/filter engine

policy-result-drop

packets that were dropped by the RPZ/filter engine

policy-result-nxdomain

packets that were replied to with NXDOMAIN by the RPZ/filter engine

policy-result-nodata

packets that were replied to with no data by the RPZ/filter engine

policy-result-truncate

packets that were forced to TCP by the RPZ/filter engine

policy-result-custom

packets that were sent a custom answer by the RPZ/filter engine

qa-latency

shows the current latency average, in microseconds, exponentially weighted over past ‘latency-statistic-size’ packets

questions

counts all end-user initiated queries with the RD bit set

resource-limits

counts number of queries that could not be performed because of resource limits

security-status

security status based on Security Polling

server-parse-errors

counts number of server replied packets that could not be parsed

servfail-answers

counts the number of times it answered SERVFAIL since starting

spoof-prevents

number of times PowerDNS considered itself spoofed, and dropped the data

sys-msec

number of CPU milliseconds spent in ‘system’ mode

tcp-client-overflow

number of times an IP address was denied TCP access because it already had too many connections

tcp-clients

counts the number of currently active TCP/IP clients

tcp-outqueries

counts the number of outgoing TCP queries since starting

tcp-questions

counts all incoming TCP queries (since starting)

throttle-entries

shows the number of entries in the throttle map

throttled-out

counts the number of throttled outgoing UDP queries since starting

throttled-outqueries

idem to throttled-out

too-old-drops

questions dropped that were too old

unauthorized-tcp

number of TCP questions denied because of allow-from restrictions

unauthorized-udp

number of UDP questions denied because of allow-from restrictions

unexpected-packets

number of answers from remote servers that were unexpected (might point to spoofing)

unreachables

number of times nameservers were unreachable since starting

uptime

number of seconds process has been running (since 3.1.5)

user-msec

number of CPU milliseconds spent in ‘user’ mode

x-our-latency

New in version 4.1: Not yet proven to be reliable

PowerDNS measures per query how much time has been spent waiting on authoritative servers. In addition, the Recursor measures the total amount of time needed to answer a question. The difference between these two durations is a measure of how much time was spent within PowerDNS. This metric is the average of that difference, in microseconds.

x-ourtime0-1

New in version 4.1: Not yet proven to be reliable

Counts responses where between 0 and 1 milliseconds was spent within the Recursor. See x-our-latency for further details.

x-ourtime1-2

New in version 4.1: Not yet proven to be reliable

Counts responses where between 1 and 2 milliseconds was spent within the Recursor. See x-our-latency for further details.

x-ourtime2-4

New in version 4.1: Not yet proven to be reliable

Counts responses where between 2 and 4 milliseconds was spent within the Recursor. Since 4.1. See x-our-latency for further details.

x-ourtime4-8

New in version 4.1: Not yet proven to be reliable

Counts responses where between 4 and 8 milliseconds was spent within the Recursor. See x-our-latency for further details.

x-ourtime8-16

New in version 4.1: Not yet proven to be reliable

Counts responses where between 8 and 16 milliseconds was spent within the Recursor. See x-our-latency for further details.

x-ourtime16-32

New in version 4.1: Not yet proven to be reliable

Counts responses where between 16 and 32 milliseconds was spent within the Recursor. See x-our-latency for further details.

x-ourtime-slow

New in version 4.1: Not yet proven to be reliable

Counts responses where more than 32 milliseconds was spent within the Recursor. See x-our-latency for further details.