1. What is Logstash?
Logstash is an open-source data collection and processing engine used to collect, transform, and send data from multiple sources to a destination like Elasticsearch.
It’s part of the Elastic Stack (ELK) – Elasticsearch, Logstash, and Kibana.
1.1. Why Use Logstash?
- Collect logs from multiple sources (files, servers, APIs, databases)
- Parse and clean data
- Enrich logs with more context (like geo info or metadata)
- Send clean data to Elasticsearch, S3, Kafka, or any output
2. Logstash — General Folder Structure
logstash/
├── bin/
├── config/
│ ├── pipelines.yml
│ ├── logstash.yml
│ ├── jvm.options
│ ├── log4j2.properties
│ └── startup.options
├── data/
│ ├── queue/
│ └── dead_letter_queue/
├── logs/
├── modules/
├── vendor/
└── lib/
3. Logstash Pipeline Structure
Logstash pipelines consist of three main stages: Input → Filter → Output.
Each stage is defined within its own configuration block, and the configuration files must use the .conf extension.
Within the Logstash config directory, these files are typically organized in a series-based structure to maintain execution order:
- 100-input.conf for input configurations
- 200-filter.conf for filters
- 300-output.conf for outputs
4. Logstash Basic Configurations
Logstash relies on two main configuration files to define how it runs and which pipelines it loads.
- logstash.yml – Controls global and runtime settings for Logstash.
- pipelines.yml – Defines which pipelines to run and where their configuration files are located.
4.1 logstash.yml
The logstash.yml file contains core configuration settings that control how the Logstash process behaves.It defines global options such as pipeline performance, logging, monitoring, and API behavior.
Typical settings you can manage include:
- Logging: Configure Logstash logs (path, level, rotation).
- Pipeline performance: Control the number of workers, batch size, and execution mode.
- Monitoring: Enable internal monitoring and define Elasticsearch endpoints for metrics.
- Elasticsearch connection defaults (optional): Set default hosts or credentials if Logstash outputs to Elasticsearch.
- HTTP API: Enable or disable the Logstash HTTP API for health checks and debugging.
You can also set these options as command-line flags when starting Logstash. If both are set, command-line flags override values from logstash.yml.
4.2 pipelines.yml
The pipelines.yml file is used when you want to run multiple pipelines within the same Logstash instance.Each pipeline processes data independently and can have its own input, filter, and output configuration.This file should be placed in the path.settings directory (for example, /etc/logstash/).
Here’s an example structure:
# pipelines.yml
- pipeline.id: my-pipeline_1
path.config: "/etc/logstash/conf.d/p1.conf"
- pipeline.id: my-other-pipeline
path.config: "/etc/logstash/conf.d/p2.conf"
Explanation:
- pipeline.id: A unique identifier for each pipeline
- path.config: The location of the .conf file that defines the pipeline’s input, filter, and output sections.
Using multiple pipelines allows you to:
- Separate different data ingestion workflows (e.g., logs, metrics, CSV imports).
- Manage and scale pipelines independently.
- Simplify maintenance and troubleshooting.
Logstash plugins
Note :- There are many other plugins in logstash and you can find them here but the common plugins are mentioned below.
Inputs define where the data comes from.
Example 1: Http_poller input plugin
This Logstash input plugin allows you to call an HTTP API, decode the output of it into event(s), and send them on their merry way. The idea behind this plugins came from a need to read springboot metrics endpoint, instead of configuring jmx to monitor my java application memory/gc/ etc.
input {
http_poller {
urls => {
fakestore => {
method => get
url => "https://fakestoreapi.com/products"
headers => {
Accept => "application/json"
}
}
}
request_timeout => 60
schedule => { every => "5m" } # Fetch data every 5 minutes
codec => "json"
metadata_target => "http_poller_metadata"
}
}
Use this api for testing : – https://fakestoreapi.com/products
Example 2: Beats Input
This input plugin enables Logstash to receive events from the Beats framework.The following example shows how to configure Logstash to listen on port 5044 for incoming Beats connections and to index into Elasticsearch.
input {
beats {
port => 5044
}
}
➡ Listens on port 5044 for incoming data from Beats agents.
Example 3: TCP Input
Read events over a TCP socket like stdin and file inputs, each event is assumed to be one line of text.Can either accept connections from clients or connect to a server, depending on mode.
input {
tcp {
port => 5000
codec => json
}
}
➡ Accepts JSON logs over TCP on port 5000.
Example 4: File Input
Stream events from files, normally by tailing them in a manner similar to tail -0F but optionally reading them from the beginning.
input {
file {
path => "/path/to/products.csv"
start_position => "beginning"
sincedb_path => "NUL"
}
}
Download the csv file from here.
Filters are used to parse, clean, and enrich data before sending it to the output.
Example 1: Grok Filter
Used to parse unstructured text using patterns.
filter {
grok {
match => { "message" => "%{IPV4:client_ip} - %{USER:user} \[%{HTTPDATE:timestamp}\] \"%{WORD:method} %{URIPATH:request}\" %{NUMBER:status}" }
}
}
➡ Converts this log line:
192.168.1.1 - john [07/Oct/2025:12:00:00 +0500] "GET /index.html" 200
into structured fields like:
client_ip: 192.168.1.1
user: john
method: GET
request: /index.html
status: 200
Example 2: Mutate Filter
The mutate filter allows you to perform general mutations on fields. You can rename, replace, and modify fields in your events.
filter {
mutate {
add_field => { "environment" => "production" }
rename => { "host" => "hostname" }
remove_field => ["@version"]
}
}
Example 3: Date Filter
The date filter is used to parse date or time fields and set that value as Logstash’s internal @timestamp for the event.
For example, syslog events have timestamps like:
"Apr 17 09:32:01"
filter {
date {
match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
target => "@timestamp"
}
}
This converts raw timestamps such as Apr 17 09:32:01 into Logstash’s standardized @timestamp field.
Example 4: JSON Filter
This is a JSON parsing filter. It takes an existing field which contains JSON and expands it into an actual data structure within the Logstash event.
filter {
json {
source => "message"
}
}
➡ Converts:
{"user":"haris","status":"ok"}
into:
user: haris
status: ok
Outputs define where processed data goes.
Example 1: Elasticsearch Output
Elasticsearch provides near real-time search and analytics for all types of data. The Elasticsearch output plugin can store both time series datasets (such as logs, events, and metrics) and non-time series data in Elasticsearch.
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "app-logs"
user => "elastic"
password => "changeme"
}
}
Outputs define where processed data goes.
Example 2: File Output
This output writes events to files on disk. You can use fields from the event as parts of the filename and/or path.By default, this output writes one event per line in json format. You can customise the line format using the line codec like
output {
file {
path => "/path-to-file/output.log"
}
}
Outputs define where processed data goes.
Example 3: Stdout Output
A simple output which prints to the STDOUT of the shell running Logstash. This output can be quite convenient when debugging plugin configurations, by allowing instant access to the event data after it has passed through the inputs and filters
output {
stdout { codec => rubydebug }
}
Outputs define where processed data goes.
Example 4: HTTP Output
This output lets you send events to a generic HTTP(S) endpoint.
output {
http {
http_method => "post"
url => "https://api.example.com/logs"
format => "json"
}
}