Classifying log messages
The syslog-ng OSE application can compare the contents of the received log messages to predefined message patterns. By comparing the messages to the known patterns, syslog-ng OSE is able to identify the exact type of the messages, and sort them into message classes. The message classes can be used to classify the type of the event described in the log message. The message classes can be customized, and for example, can label the messages as user login, application crash, file transfer, and so on events.
To find the pattern that matches a particular message, syslog-ng OSE uses a method called longest prefix match radix tree. This means that syslog-ng creates a tree structure of the available patterns, where the different characters available in the patterns for a given position are the branches of the tree.
To classify a message, syslog-ng OSE selects the first character of the message (the text of message, not the header), and selects the patterns starting with this character, other patterns are ignored for the rest of the process. After that, the second character of the message is compared to the second character of the selected patterns. Again, matching patterns are selected, and the others discarded. This process is repeated until a single pattern completely matches the message, or no match is found. In the latter case, the message is classified as unknown, otherwise the class of the matching pattern is assigned to the message.
To make the message classification more flexible and robust, the
patterns can contain pattern parsers: elements that match on a set of
characters. For example, the NUMBER parser matches on any integer or
hexadecimal number (for example, 1, 123, 894054, 0xFFFF, and so on).
Other pattern parsers match on various strings and IP addresses. For the
details of available pattern parsers, see
Using pattern parsers.
The functionality of the pattern database is similar to that of the
logcheck project, but it is much easier to write and maintain the
patterns used by syslog-ng OSE, than the regular expressions used by
logcheck. Also, it is much easier to understand syslog-ng OSE pattens than
regular expressions.
Pattern matching based on regular expressions is computationally very intensive, especially when the number of patterns increases. The solution used by syslog-ng OSE can be performed real-time, and is independent from the number of patterns, so it scales much better. The following patterns describe the same message: Accepted password for bazsi from 10.50.0.247 port 42156 ssh2
A regular expression matching this message from the logcheck project: Accepted (gssapi(-with-mic|-keyex)?|rsa|dsa|password|publickey|keyboard-interactive/pam) for [\^[:space:]]+ from [\^[:space:]]+ port [0-9]+( (ssh|ssh2))?
A syslog-ng OSE database pattern for this message: Accepted @QSTRING:auth_method: @ for@QSTRING:username: @from @QSTRING:client_addr: @port @NUMBER:port:@ ssh2
For details on using pattern databases to classify log messages, see Using pattern databases.