In the realm of data manipulation and analysis, XML (eXtensible Markup Language) files continue to serve as a foundational tool for organizing information in a structured manner. These adaptable files find purpose in storing an array of data types, ranging from straightforward configuration details to intricate document layouts. Yet, effectively gathering and transmitting XML data to various processing channels demands the use of specialized solutions. This is where Filebeat, a nimble and trustworthy log shipping tool hailing from the Elastic Stack, emerges as a vital asset. Within the upcoming sections, we’ll embark on an exploration of the intricacies involved in harnessing Filebeat to ingest complete XML files. Our journey will uncover the key steps required to seamlessly integrate this process, enabling you to derive valuable insights from your XML-based data repositories.

Configuring Filebeat to read single-line log files or enabling multiline support is a straightforward process. For instance, consider an XML file containing logs structured as a complete message. In such cases, Filebeat can be effortlessly set up to efficiently parse and process these logs.

It’s important to read the whole file as one message to we can decode its as xml in logstash or ingest pipeline. To read we have to use the “multiline” feature of the filebeat.

We will provide the “pattern” as the starting point for reading ^<?xml version="1.0". Here the “flush_pattern” is very important to provide otherwise filebeat won’t know when to end reading the file and send the content as message.

As this file is written once and the content is not updated or appended, so its important to provide the “close_eof”. This will make sure to close the harvester after the file has been read and will avoid having too many harvesters open at one given time.

The yaml configuration will look like the following (be aware of the the spaces required by the yaml)
- type: log
enabled: true
paths:
- /Users/wasim/Downloads/my-files/*.xml
multiline:
pattern: '^<?xml version="1.0"'
negate: true
match: after
flush_pattern: '^[\S]*</newSensorDataFile>'
max_lines: 100000
close_inactive: 30s
close_eof: true

So the final configuration will look like the following (with spaces).

Happy reading with the filebeat.