Redundancy Manager Service¶
- group Redundancy Manager
Aggregates component faults to determine system-wide health.
- group Types
Enumerations for fault sources and system states.
Defines
-
REDUNDANCY_MANAGER_MAX_FAULTS¶
Maximum number of active faults to track simultaneously.
-
REDUNDANCY_MANAGER_SERVICE_UID¶
Service Unique Identifier (16-bit). Used to construct unique Event IDs. 0x5366 = “Sf” (SafeMode/System Fault)
-
REDUNDANCY_EVENT_CRITICAL_HEALTH¶
-
REDUNDANCY_EVENT_HEALTH_DEGRADED¶
-
REDUNDANCY_EVENT_HEALTH_RECOVERED¶
-
REDUNDANCY_EVENT_COMPONENT_DEGRADED¶
-
REDUNDANCY_EVENT_COMPONENT_RECOVERED¶
-
REDUNDANCY_EVENT_HEALTH_RESPONSE¶
-
REDUNDANCY_EVENT_COMPONENT_STATUS_RESPONSE¶
-
REDUNDANCY_EVENT_FAULT_LIST_RESPONSE¶
-
REDUNDANCY_EVENT_TELEMETRY¶
Typedefs
-
typedef uint32_t fault_code_t¶
Service-specific error code.
Specific codes are defined in the reporting service’s header. Example: A FAULT_SOURCE_RAIL might report RAIL_FAULT_OVERCURRENT.
Enums
-
enum redundancy_manager_event_id_t¶
Events published by the Redundancy Manager.
Values:
-
enumerator REDUNDANCY_EVENT_CRITICAL_HEALTH¶
Published when system enters critical health state.
Applications should initiate Safe Mode transition.
Payload: system_health_t (always SYSTEM_HEALTH_FAULT)
-
enumerator REDUNDANCY_EVENT_HEALTH_DEGRADED¶
Published when system health becomes degraded.
Mission can continue with reduced capability. Services may need to adapt (e.g., reduce power consumption, disable non-critical features).
Payload: system_health_t (always SYSTEM_HEALTH_DEGRADED)
-
enumerator REDUNDANCY_EVENT_HEALTH_RECOVERED¶
Published when system recovers to nominal health.
All critical and degraded faults have been cleared.
Payload: system_health_t (always SYSTEM_HEALTH_OK)
-
enumerator REDUNDANCY_EVENT_COMPONENT_DEGRADED¶
Published when a specific component becomes degraded.
Indicates a component failure with fallback available. Affected services should switch to backup/redundant hardware.
Example: Primary UART fails → switch to auxiliary UART
Payload: component_degradation_t
-
enumerator REDUNDANCY_EVENT_COMPONENT_RECOVERED¶
Published when a degraded component recovers.
Services may optionally switch back to primary hardware.
Payload: component_id_t
-
enumerator REDUNDANCY_EVENT_HEALTH_RESPONSE¶
Published in response to REQUEST_REDUNDANCY_HEALTH query.
Payload: health_response_t
-
enumerator REDUNDANCY_EVENT_COMPONENT_STATUS_RESPONSE¶
Published in response to REQUEST_REDUNDANCY_COMPONENT_STATUS query.
Payload: component_status_response_t
-
enumerator REDUNDANCY_EVENT_FAULT_LIST_RESPONSE¶
Published in response to REQUEST_REDUNDANCY_FAULT_LIST query.
May require multiple events if fault list is large (chunked response).
Payload: fault_list_response_t
-
enumerator REDUNDANCY_EVENT_TELEMETRY¶
Periodic telemetry broadcast (e.g., every 30 seconds).
Contains summary of system health and fault counts.
Payload: redundancy_telemetry_t
-
enumerator REDUNDANCY_EVENT_CRITICAL_HEALTH¶
-
enum fault_source_t¶
Identifies the subsystem reporting a failure.
Values:
-
enumerator FAULT_SOURCE_BATTERY¶
BMS issues (voltage, temperature, etc.)
-
enumerator FAULT_SOURCE_MPPT¶
Solar charging failures
-
enumerator FAULT_SOURCE_RAIL¶
Rail controller (overcurrent, enable failures)
-
enumerator FAULT_SOURCE_SENSOR¶
I2C/SPI sensor timeouts or bad data
-
enumerator FAULT_SOURCE_UART¶
UART communication errors
-
enumerator FAULT_SOURCE_WATCHDOG¶
Watchdog timeout or service hang
-
enumerator FAULT_SOURCE_MEMORY¶
Flash/EEPROM errors
-
enumerator FAULT_SOURCE_COUNT¶
Number of fault sources
-
enumerator FAULT_SOURCE_BATTERY¶
-
enum fault_severity_t¶
Severity classification for individual faults.
Values:
-
enumerator FAULT_SEVERITY_INFO¶
Informational, no action required
-
enumerator FAULT_SEVERITY_WARNING¶
Potential issue, monitor closely
-
enumerator FAULT_SEVERITY_DEGRADED¶
Component degraded, fallback available
-
enumerator FAULT_SEVERITY_CRITICAL¶
Critical failure, Safe Mode required
-
enumerator FAULT_SEVERITY_INFO¶
-
enum system_health_t¶
High-level classification of EPS health.
Used by applications to drive state transitions.
Values:
-
enumerator SYSTEM_HEALTH_OK¶
All systems nominal
-
enumerator SYSTEM_HEALTH_DEGRADED¶
Non-critical faults, mission continues
-
enumerator SYSTEM_HEALTH_FAULT¶
Critical failure, requires Safe Mode
-
enumerator SYSTEM_HEALTH_OK¶
-
enum component_id_t¶
Identifiers for components with redundancy/fallback options.
Values:
-
enumerator COMPONENT_UART_PRIMARY¶
Primary UART (port 1)
-
enumerator COMPONENT_UART_SECONDARY¶
Secondary UART (port 3)
-
enumerator COMPONENT_I2C_BUS_1¶
I2C bus 1
-
enumerator COMPONENT_I2C_BUS_2¶
I2C bus 2
-
enumerator COMPONENT_I2C_BUS_3¶
I2C bus 3
-
enumerator COMPONENT_I2C_BUS_4¶
I2C bus 4
-
enumerator COMPONENT_SOLAR_STRING_1¶
Solar panel string 1
-
enumerator COMPONENT_SOLAR_STRING_2¶
Solar panel string 2
-
enumerator COMPONENT_SOLAR_STRING_3¶
Solar panel string 3
-
enumerator COMPONENT_SOLAR_STRING_4¶
Solar panel string 4
-
enumerator COMPONENT_SOLAR_STRING_5¶
Solar panel string 5
-
enumerator COMPONENT_SOLAR_STRING_6¶
Solar panel string 6
-
enumerator COMPONENT_COUNT¶
Number of tracked components
-
enumerator COMPONENT_UART_PRIMARY¶
-
struct fault_t¶
- #include <redundancy_manager.h>
Represents a single active fault in the system.
Public Members
-
fault_source_t source¶
Subsystem that reported the fault
-
fault_code_t code¶
Service-specific error code
-
fault_severity_t severity¶
Severity classification
-
uint32_t timestamp_ms¶
When the fault was first detected
-
uint32_t count¶
Number of times this fault has occurred
-
bool active¶
True if fault is currently active
-
fault_source_t source¶
-
struct component_degradation_t¶
- #include <redundancy_manager.h>
Payload for component degradation events.
Public Members
-
component_id_t component¶
Which component is degraded
-
fault_source_t fault_source¶
What caused the degradation
-
bool fallback_available¶
True if fallback/redundant option exists
-
component_id_t component¶
-
struct health_response_t¶
- #include <redundancy_manager.h>
Response payload for health queries.
Public Members
-
system_health_t health¶
Current system health
-
uint32_t active_fault_count¶
Number of active faults
-
uint32_t timestamp_ms¶
When health was sampled
-
system_health_t health¶
-
struct component_status_request_t¶
- #include <redundancy_manager.h>
Request payload to query specific component status.
Public Members
-
component_id_t component¶
Which component to query
-
component_id_t component¶
-
struct component_status_response_t¶
- #include <redundancy_manager.h>
Response payload for component status queries.
Public Members
-
component_id_t component¶
Requested component
-
bool is_ok¶
True if operational, false if degraded
-
fault_source_t fault_source¶
What caused degradation (if degraded)
-
uint32_t timestamp_ms¶
When status was sampled
-
component_id_t component¶
-
struct fault_list_response_t¶
- #include <redundancy_manager.h>
Response payload for fault list queries.
Contains a subset of active faults. May require multiple events for complete fault list.
-
struct redundancy_telemetry_t¶
- #include <redundancy_manager.h>
Periodic telemetry payload.
-
struct redundancy_manager_t¶
- #include <redundancy_manager.h>
The redundancy manager state.
Public Members
-
system_health_t health¶
Current aggregated system health
-
bool component_status[COMPONENT_COUNT]¶
True = OK, False = Degraded
-
uint32_t total_fault_count¶
Total faults since boot (for telemetry)
-
bool initialized¶
True if initialized
-
system_health_t health¶
-
REDUNDANCY_MANAGER_MAX_FAULTS¶
- group Public API
Functions for initializing the redundancy manager.
All interaction with the redundancy manager occurs through events. The manager publishes health updates and responds to query requests via the event bus.
Functions
-
void redundancy_manager_init(redundancy_manager_t *manager)¶
Initialize the Redundancy Manager.
Clears all faults, sets system health to SYSTEM_HEALTH_OK, and subscribes to:
Fault events from all services (e.g., BATTERY_EVENT_FAULT_DETECTED)
Query request events from applications
Recovery events from services
Publishes initial REDUNDANCY_EVENT_HEALTH_RECOVERED event after init.
- Parameters:
manager – [inout] The redundancy manager to initialize.
-
void redundancy_manager_init(redundancy_manager_t *manager)¶