The Enterprise cloud is generally a naive and immature hybrid stack that has a long way to go before it can enable the rich, analytics-driven cloud experience that users and applications get when they run on public clouds like AWS and Google Cloud. In the first Virtualization Field Day 6 Video, we discuss with the delegates how AppFormix is vital to build a mature Enterprise Cloud.
Our company goal is to enable Enterprises to build better clouds for their users.
We enable a better-than-public-cloud experience. Real-time analytics are used for orchestration and reliability of your infrastructure. Long-term analytics are used for capacity planning. We also have some additional magic sauce elements to provide very detailed analytics about how your applications are performing in terms of disk and Network IO.
Our layer sits alongside your cloud management stack, like OpenStack or Kubernetes. We enable an analytics-driven experience for all the users and applications with our highly-scalable micro-service architecture. In the spirit of simplicity, we have also focused on quick and painless installation, which we accomplish using a combination of Ansible dynamic inventory for our agents and Docker. All of our software components are implemented and lightweight Docker Containers that communicate with an open message bus. For our customers, it has been key to get our software on the hosts as painlessly as possible, which we do thanks to our APIs.
We make the cloud:
1) Easy to operate
2) Easy to scale
3) Easy for applications to consume intelligently and communicate with.
The Virtualization Field Day 6 Delegates had some great questions about how we achieve our goal for enterprise clouds. Read on below or watch the video to find out for yourself.
When you say host, what do you mean?
- We are not talking about your VCenter Server, but the compute node on which you spin up your containers, virtual machines, your KVM or Hyper-V.
Why is it necessary to get the analysis so quickly? Why can’t you be one minute behind?
- Think about it this way. If what we want to achieve is better reliability, the goal is to prevent the problem from happening and causing service disruption, which means we need to detect there is a problem before it occurs. The first sign that something is about to go wrong is when we want to detect it and do something about it. In that situation, time matters.
How are you scalable? What do you mean when you say you are highly scalable?
- Our software is to build with micro-services architecture, layer upon layer. Each layer can function entirely on its own and has a well defined API. We have a message bus with multiple endpoints which are doing their own real-time analysis and stratifying communication speeds based on message importance. We don’t have to have a super controller that uses huge amounts of CPU and memory to analyze all the end points.
All of our components, such as the AppFormix Controller, Data Store, Dashboard, and Agents as seen in the diagram below, in addition to components such as OpenStack Adaptors, are implemented as independent Docker Containers.
Are the normal KVM or Hyper-V counters not fast enough or conclusive enough?
- The question is not whether the KVM and Hyper-V counters are fast enough, but it is a question of how fast those counters can be digested. An agent only listening to the KVM/Hyper-V basic counters would spend a lot of time grabbing those metrics, packaging them, and then sending them off from the server. That agent spends more time and resources in packaging and sending off raw metrics than doing the analysis. Using our architecture, you can analyze counters and metrics at a much higher frequency and findings can be passed on the message bus instantly. Mundane metrics that don’t require immediate action and don't need to be processed quickly can be afforded to leave that server at a slower pace.
About AppFormix Agents
Does the central controller read, analyze, and put the metrics and signals on the message bus, or is it the agent?
- The agent is a fully functional autonomic system that can read, analyze, and send the signals out on a message bus, which can be directly consumed by our orchestration and UI, or yours using our APIs. The central controller is there to configure the agent and lead larger scale analysis.
If you are kicking down the CPU and memory use of the central controller, what is the impact to the agent? What is the agent overhead?
- As the agent is only reading and doing the analysis, the overhead is only in terms of CPU. We have benchmarked the agent CPU use at 0.1% per instance running on the host. In practice in customer environments, a host loaded with about 60 virtual machines, the agent uses about 1% of the overall host CPU capacity. One of our key innovations is to keep improving the efficiency of that agent and make updating painless.
It is one thing when I have nodes like Nova nodes and KVM nodes, but I also just have container hosts. Those container hosts are usually not OpenStack and KVM, but a lightweight Linux or something more raw. Do you have a generic Linux agent for container hosts?
- No. The agent is the same. This is the purpose of our central controller. Our central controller gets the context of the management layer, it learns the context of the Linux, OpenStack, KVM, or Kubernetes of the host and then configures the agent for that context. This is why the agent is able to process the data - data without context is useless, but the agent knows the context from the central controller. What we get from Kuernetes, OpenStack, Linux, and KVM is all of the context. Because we make our agent aware of the context, when that agent is doing the analysis it knows exactly what it is analyzing. We know this is an instance, it belongs to a particular user, it is running an application this is what it’s expecting. What the controller does is configure the agent and tells it about all of this context in the management stack. That’s why when the agent is generating a signal, it’s not just saying, “Aha! There is something wrong in this server!" It says that, “Hey, on this server, there is this virtual machine, that belongs to this project, that was spun up by this user that has something wrong with it.” You can consume that signal instantly and do something useful with it. The signals sent are not just server-centric, but also vm-centric and container-centric, without interfering with the vm and container internals.
If you can work on public and hybrid clouds, why does your messaging say this is a private cloud on premise solution?
- Our agent is cloud agnostic. The point is that it can sit anywhere and aggregate that information. Currently, all of our customers run private clouds. In AWS, you don't have access to the host infrastructure, so it would be incorrect to say that our host infrastructure monitoring solution works the same. However in the case of the AWS public cloud, the main focus would be to install on the virtual machine that the customer pays for and then to monitor the containers they spin up within the virtual machine and aggregate those analytics.
What about running on hybrid and public clouds?
- In that scenario, our agent is designed for you to be able to take it and also run it on the virtual machines and we can aggregate on those metrics for you.
What does AppFormix support at the moment for on-premises (private) clouds?
- Currently what is generally available and what we have deployed for customers is just KVM virtualization or OpenStack running on top of a Linux distro. We also have a version that works with Hyper-V. We have another version that is in Beta that works with Kubernetes that is going to be GA in Q1 of 2016. AppFormix is an active contributor to Kubernetes, and several members of our engineering team are making major efforts towards contributing to Kubernetes open source.
How are your alarms triggered? Are they based around hard thresholds like 80% CPU, or are they based on dynamic thresholds that learn over time?
- We do both. Out of the box, you get hard thresholds like deviations, percentages, max, mins, and moving averages, but the way we have designed our system is that we also have different analyzers on the message bus that give you the dynamic baselines. It really depends on the customer and how they want to run their Cloud environment. We reconfigure our system to help them run it as their use case demands.
Is your product running its own OpenStack distribution?- We do not have our ownOpenStack distribution. We are partners withOpenStack, we are also partners withMirantis, Ubuntu, andRedHat as well. We also work with theRackspace Enterprise distribution. Everything we do works with Docker,OpenStack, Hyper-V,Kubernetes for bothVMs and Containers. In terms of hardware, we also work closely with Intel to grab host metrics and really pinpoint how workloads are performing on Intel hardware.
So, there you have it!
AppFormix works by implementing autonomous agents, configured with context from the Central Controller. These agents then put signals on a message bus, that can be consumed directly and are also easily accessible through well defined API’s and User Interfaces. This user experience is entirely role-based. We integrate with the authentication you are using and give information access based on that. So in the example of OpenStack, we use the Keystone authorization and based on those credentials, your views are customized for you. Extremely important because on a host, you can have VMs that belong to different users, and you want your users only to have access to the information about their VM, without having to configure the analysis software.
The way we work as a company, many of our features are direct requests from our customers, so as we work with them, we continue to improve our agents.
If you have any further questions about our product or approach, please leave a comment or tweet us!
Get a free trial of the AppFormix Analytics for your Enterprise Data Center!