Learning recovery strategies for dynamic self-heal in reactive systems
2022-06-06
Self-healing applications generally depend on a set of predefined instructions that the system must follow in order to recover from a failure state. Such actions are triggered from predefined hooks in the program. Moreover, self-healing strategies detect failure states based on message response times, or metrics that are not expressive enough to detect different types of failures. Such strategies are usually applied in the context of distributed systems, where the detection of failures is constrained to communication problems, and resolution strategies often consist of replacing complete components. However, current complex systems may reach failure states at a fine granularity that were not anticipated by developers. For example, value range changes for data streaming in IoT systems. To counter these problems, we propose a self-healing framework that learns recovery strategies to heal fine-grained system behavior at run time. We demonstrate and evaluate our healing strategies in a new domain, reactive systems. Our proposal uses monitor predicates to define satisfiability conditions of the system state. Such monitors have functional expressivity and can be defined at run time to detect failure states. Once failure states are detected, we use a Reinforcement Learning-based technique to learn a recovery strategy based on users¿ corrective atomic actions. Finally, to execute the learned strategies we define them as Context-oriented Programming variations that activate at run time whenever the failure state is detected, overwriting the base system behavior with the recovery strategy for that state. We validate the feasibility and effectiveness of our framework through a prototypical reactive application for tracking mouse movements in different scenarios. Our results demonstrate that with just the definition of monitors, the system is indeed able to recover from failure states without a predefine strategy
- Tesis/Trabajos de Grado [458]