Scalable Log Collection as Foundation of SOC

Logs provide a wealth of information and that is one of the reasons that almost all security standards and frameworks (NIST, ISO, PCI, and others) emphasize on collection, storage, and analysis of log data as one of the key aspects of any security program. Collecting and managing logs is a fundamental requirement of any SOC implementation and is needed to meet many compliance needs.

However, as we know, some log sources provide much more value to security programs compared to others. So while you can collect, store and process all data you want, thinking about the true value can help you create a more cost-effective and focused strategy.

A phased approach for log management is always prudent where you start with important, more valuable log sources first and then add additional log data as your program matures.

While traditional log collection using Syslog protocols and log files has worked for quite some time, newer technologies are bringing challenges to log collection using older methods. With fast transition to Cloud based technologies, newer log data may come from SaaS applications, Cloud application platform, server-less applications, IoT devices, operational technologies, connected vehicles, drones, smart city technologies, and many others. These new log sources don’t always send logs with Syslog and may utilize APIs, web services, or Cloud services specially built for logging. While planning for collecting log data and building a log collection platform, all of these new options must be considered.

Distributed Log Collection

A distributed log collection architecture where local log collectors receive logs from different log sources and then forward to one or more central locations is commonly used today. This architecture helps in providing resiliency and reduction of loss of data in case communication link to central log collection becomes unavailable. The following diagram shows one such arrangement.

Welcome to brave new world of log collection using many methods to collect logs from Cloud, IoT, Vehicles, Drones, Operations Technologies, and others. Standing up a Syslog server is no longer sufficient.

A more distributed architecture can both collect as well as indexlog data locally and then make the indexes available to search requests from SOC analysts. This may be necessary to meet certain privacy needs like GDPR. However, one need to consider of the flexibility and scalability of distributed log collection infrastructure with the cost of managing it. As an example, indexing logs close to edge is attractive but it can create additional overhead in terms of correlation, reporting, alerting as well as cost of managing indexes at multiple locations. Needless to say that like everything else in life, there are some compromises to be made here as well!

Logging and NTP Protocols

A timestamp is an essential part of each log event. An important factor in building logging infrastructure is to ensure time synchronizing among all log sources to keep proper order of logs. Network Time Protocol (NTP) is commonly used for purpose. While NTP is a topic in itself, it is sufficient to at this point to understand that no logging infrastructure is complete until NTP is implemented to support it. Without it, log correlation and analytics will not work properly.

Logging Standards

Lastly, building logging standards to identify type, amount, and level of logging also goes a long way to build a consistent approach throughout an organization. A logging standard must address requirements for logging at different levels including system, middleware, and applications. The logging standards should also specify accepted logging protocols, storages and lifecycle of log data. Logging standards must be updated at least on annual basis to ensure new sources and types of important logs are taken into consideration based upon their value.


While building a scalable and distributed logging infrastructure, one should consider the following:

  1. Use of local log collectors that could help in reliability, buffering, compression and bandwidth saving
  2. Understanding that modern log collection needs support of diverse log collection mechanisms that include Syslog, APIs, IoT protocols like MQTT, plain text files, XML, binary logs and others
  3. Prioritize logs sources based upon their contributed value towards better risk management and threat detection or response
  4. Use NTP in conjunction with the overall logging infrastructure to ensure proper order and correlation of logs
  5. Build logging standards to bring consistency and clarity of logging requirements

By taking into account the above factors, there is a much better probability that you will be able to build a better logging infrastructure that grows with your needs, reduces cost, is more efficient and resilient, and brings more value towards managing risk.

PS: Subscribe to my blog at and follow on Twitter @rafeeq_rehman


About Rafeeq Rehman

Consultant, Author, Researcher.
This entry was posted in InfoSec, SOC and tagged , , , . Bookmark the permalink.

Comments are closed.