Monitoring
Monitoring tremor is done using tremor itself. This has some interesting implications.
Each connector and each pipeline emit metrics, they can be subscribed to using the metrics connector. This allows to both write these messages to a destination system such as InfluxDB, as well as to a message queue such as Kafka.
The rate at which metrics are published can be defined for each connector and pipeline.
By default metrics are turned off.
Metrics are formatted following the same structure as the Influx Codec.
Metrics
Tremor allows collecting it's own metrics and processing them, this enables us to send metrics to any system tremor can connect to and not force our users into a specific ecosystem.
An example can be found in the metrics guide
Pipeline metrics
Configuring metrics for pipelines is done with the config directive:
#!config metrics_interval_s = <seconds>
This directive determines the interval in seconds at which metrics events are emitted from the pipeline into the metrics connector.
Example pipeline definition:
define pipeline with_metrics
pipeline
#!config metrics_interval_s = 5
select event from in into out;
end;
This pipeline will emit a metrics event every 5 seconds.
Each pipeline will emit 1 metrics event per interval with a count of the events it received on its input port (Usually in
).
Example metrics event:
{
"measurement": "events",
"tags": {
"direction": "output",
"node": "in",
"pipeline": "main",
"port": "out"
},
"fields": { "count": 20 },
"timestamp": 1553077007898214000
}
In influx format:
events,port=out,direction=output,node=in,pipeline=main count=20 1553077007898214000
In this structure measurement
is always events
as that is what this is measuring. The number of events is always in the field count
as we are counting them.
The tags
section explains where this measurement was taken:
direction
means if this event came into the node"input"
or came out of the node"output"
node
is theid
of the node in a given pipelinepipeline
is theid
of the pipelineport
is the point the event was received or send from
The example above measures all events that left the in
of pipeline main
.
In addition to the general pipeline metrics, some operators do generate their own metrics, for details please check on the documentation for the operator in question.
Connector metrics
For connectors metrics can be configured with the argument metrics_interval_s = <time>
in the with
part of the connector definition.
Example connector definition:
define connector with_metrics from stdio
with
metrics_interval_s = 7,
codec = "string",
preprocessors = ["separate"],
postprocessors = ["separate"]
end;
This connector will emit 1 metrics event for each port. in
for received events, out
for emitted events and err
for emitted error events (e.g. if a codec failed to decode an event).
{
"measurement": "connector_events",
"tags": {
"port": "out",
"connector": "example_connector"
},
"fields": { "count": 42 },
"timestamp": 1576215344378248634
}
In influx format:
connector_events,port=out,ramp=example_connector count=42 1576215344378248634
In this structure measurement
is always connector_events
as that is what this is measuring. The number of events is always in the field count
as we are counting them.
The tags
section explains where this measurement was taken:
connector
is theid
of the connectorport
is one ofin
,out
anderr
(or any other port used by the connector)
The example above measures all events that were emitted out by the connector example_connector
.
Notes:
- Preprocessor and codec level errors count as errors for connector metrics.
- If your pipeline is using the batch operator and connector is receiving events from it, no of events tracked at connector is going to be dictated by the batching config.
Operator level metrics
In addition to the metrics provided by the pipeline itself, some operators can generate additional metrics.
The details are documented on a per operator level. Currently the following operators provide custom metrics:
Logging
Tremor uses the log4rs this allows a rather flexible configuration that can be changed without restarting tremor. An alternative is setting RUST_LOG
for global lgo levels as provided by env_logger.
The log4rs config yaml is passed in over the -l
arugment. An example would be:
refresh_rate: 30 seconds
appenders:
stdout:
kind: console
root:
level: warn
appenders:
- stdout
loggers:
tremor_runtime:
level: debug
appenders:
- stdout
additive: false
tremor:
level: debug
appenders:
- stdout
additive: false
More about the format can be found in the documentation.