CogNETive is a network Exploration and Troubleshooting service for Operations@Scale.
As cloud service deployments become increasingly complicated and large, network management and troubleshooting is emerging as a major challenge for cloud DevOps engineers. While compute node failures are common and reasonably dealt with, failures of network components are often catastrophic.
Networks are opaque and hard to understand, even for network specialist (which most DevOps are not). The network infrastructure is complex, consisting of multiple virtual and physical layers, often owned and managed by different entities. Direct access to the network components is very risky, as the network provides the lifeline for cloud operation, in both data and control plains; thus, getting real-time information from all the layers is not always possible. When deployed at scale it becomes very hard to get the network right in the first place, and even harder to fix.
CogNETive is a toolkit and service for network exploration and troubleshooting @scale. It provides end-to-end visibility into the network combined with automated insights.
SkyDive and custom probes capture real-time network topology, telemetry, and flow data, which is persisted and indexed in ElasticSearch. Grafana, Apache Spark, and custom algorithms analyze the network data. SkyDive and custom Grafana dashboards provide a rich user interface to explore the network.
- Preloaded filters.
- Better node placement.
IBM Research - 2018 | CogNETive.firstname.lastname@example.org