No-Index Log Management at S3 Scale

Database indexes are invaluable for information systems with low throughput, low latency and high consistency requirements. Both compute, and disk space are required for creating indexes along with any required operational overheads. Often, resource, time, and cost to maintain indexing far outweighs the performance objectives of the log management tool itself.

LOGIQ’s log analytics has a unique no-index approach to log management allowing infinite scale, while ensuring search and query performance. For achieving this, we have to solve the problem of infinite scale for both our data and metadata stores.

LOGIQ maintains its metadata in Postgres. However, that cannot scale infinitely without incurring significant cost. Our Hybrid metadata layer manages the migration of metadata tables between postgres and S3. Metadata that is old, is seamlessly tiered to S3 and is fetched on-demand when needed. The Key/Value nature of S3 allows us to fetch granular metadata on-demand without additional indexes being maintained.

A similar approach is applied to data. Incoming data is broken into chunks and stored in a partitioned manner so object lookups for e.g. a namespace or an application does not need additional indexes. The object key implicitly encodes the index information. This makes lookups and retrievals efficient when data is needed from the S3 layer that is not found in the local disk cache.

LOGIQ’s architecture offers unique advantages by using S3 as its primary storage location. Yes! S3 is not a secondary storage tier in our architecture.

  • S3 storage for data and metadata

    Storing both data and metadata vs using local storage significantly reduces the total cost of the solution. Most scaled out self-service log analytics solutions require costly management of volumes at scale! LOGIQ abstracts it as an S3 API.

  • No-Index log management

    Eliminates costly compute and storage that would otherwise be used constantly for indexing, rebuilds etc.

  • Eliminate Data egress cost

    When running in public cloud environments, deploying LOGIQ with the S3 bucket in the same region eliminates costly egress and data transfer costs that can run into tens of thousands of dollars when sending data to an external cloud provider.

LOGIQ is the first real-time platform to bring together benefits of object store (scalability, one hop lookup, better retrieval,  ease of use, identity management, lifecycle policies, data archival etc) and distributed compute via Kubernetes, along with highly configurable dash-boarding, query, alerting and search. As a result, we provide much reduced cost, easy integration with other analytics tools, and operational agility.

Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn

Firelens demystified

AWS Firelens is a log routing agent for Amazon Elastic Container Service (ECS) containers. Applications on ECS run as docker containers. Containers can be run on a serverless infrastructure that is managed by ECS using the Fargate launch type. For more control over the infrastructure, containers can be hosted on a cluster of Amazon Elastic Compute Cloud (Amazon EC2) instances. In both of these scenarios, AWS manages networking, storage, security, IAM, and other necessary services required to run the containers.

FireLens for Amazon ECS enables the administrator to use task definition attributes to route logs to external log aggregators. It unifies the data collection across the ECS cluster. Its pluggable architecture allows adding data sources, parsers, filter/buffering, and output plugins.

The biggest advantage with Firelens is that you can connect almost any service endpoint as long as data sinks can process general-purpose JSON over HTTP, FluentFoward or TCP protocols. The Firelens magic is all about transforming the log output of ECS containers and generating the necessary routing configuration for sending logs to the logging service.

For using Firelens, define the log collector and sink. The sink can be any log aggregation provider like LOGIQ Log Insights. Let us now see how to put this together. We need the following:

  • A log router container with FireLens configuration marked as essential.
  • Application containers specifying the ”awsfirelens” log driver.
  • A task IAM role ARN for permissions needed to route logs

Below you will find a few “logConfiguration” examples that can be for your task definition. Note how the “logDriver” is set to “awsfirelens”. The “options” contain additional attributes for the log sink where the log data will be sent.

"logConfiguration": {
        "logDriver": "awsfirelens",
       "options": {
                 "Name": "forward"
                 "Port": "24224",
                 "Host": "logiq.example.com"
}
}

The ”awsfirelens” log driver allows you to specify Fluentd or Fluent Bit output plugin configuration. Your application container logs are routed to a sidecar or independent Firelens container inside your cluster that further routes your container log to its destination as defined in your task “logConfiguration”. Additionally, you can use the options field of the FireLensConfiguration object in the task definition to serve any advanced use case.

"firelensConfiguration" : {
      "type" : "fluentbit",
      "essential":true,
      "options" : {
         "config-file-value" : "arn:aws:s3:::mybucket/myFile.conf",
         "config-file-type" : "s3"
      }
   }

The diagram above shows how Firelens works. Container logs are sent to the Firelens container using the docker Log Driver. When the ECS Agent launches a Task that uses Firelens, it constructs a Fluent configuration file:

  • A specification of log source for how to gather the logs from the container

  • An ECS Metadata record transformer

  • Optional User-provided configuration. If you specify your own configuration file, firelens will use the ”include” directive to import it in the generated configuration file.

  • Log destinations or sinks derived from the Task Definition

The following snippet shows a configuration for including ECS metadata like container and cluster details.

{
   "containerDefinitions" : [
      {
         "image" : "906394416424.dkr.ecr.us-west-2.amazonaws.com/aws-for-fluent-bit:latest",
         "firelensConfiguration" : {
            "options" : {
               "enable-ecs-log-metadata" : "true"
            },
            "type" : "fluentbit"
         },
         "name" : "log_router",
         "essential" : true
      }
   ]
}

To demonstrate how Firelens works end to end, the below is a task definition example containing an HTTP web server and a Firelens sidecar container to route logs to the LOGIQ server. Also, replace the task execution role if it is named other than the default “executionRoleArn” and populate the account id shown in XXXXXXXXXXXX in the following example:

{
   "family" : "firelens-logiq",
   "executionRoleArn" : "arn:aws:iam::XXXXXXXXXXXX:role/ecs_task_execution_role",
   "taskRoleArn" : "arn:aws:iam::XXXXXXXXXXXX:role/ecs_task_iam_role",
   "containerDefinitions" : [
      {
         "memoryReservation" : 50,
         "essential" : true,
         "firelensConfiguration" : {
            "type" : "fluentbit",
            "options" : {
               "enable-ecs-log-metadata" : "true"
            }
         },
         "image" : "amazon/aws-for-fluent-bit:latest",
         "name" : "log_router_logiq",
         "logConfiguration" : {
            "logDriver" : "awsfirelens",
            "options" : {
                "Host" : "logiq.example.com", 
"Name" : "forward",
"Port" : "24224" } } }, { "essential" : true, "memoryReservation" : 100, "logConfiguration" : { "options" : { "Host" : "logiq.example.com", "Name" : "forward", "Port" : "24224" }, "logDriver" : "awsfirelens" }, "name" : "app", "image" : "httpd" } ], }