Architecture¶
This page is the contract surface for how the library behaves during lifecycle transitions. Read it after Mental Model and before treating the API reference as authoritative lookup.
Registration closes, components attach to the node, and resource ownership becomes explicit.
Successful hooks mark managed entities active and open runtime behavior.
Hooks execute in order, results are aggregated, and the node remains the only entry point.
Concurrency, propagation, and error policy define what happens when transitions overlap or fail.
Cleanup, shutdown, and error handling clear active state and release resources deterministically.
Overview¶
lifecore_ros2 is a minimal lifecycle composition library for ROS 2 Jazzy — no hidden state machine.
The node still owns the native ROS 2 lifecycle. The library adds a small composition layer so reusable components can follow the same lifecycle contract.
The architecture is centered on two layers:
a lifecycle-aware core in
src/lifecore_ros2/corereusable topic-oriented components in
src/lifecore_ros2/components
If you need the conceptual model before the structural details on this page, read Mental Model first.
What this page answers¶
Who owns components and when registration closes.
How transitions propagate and how results are aggregated.
Which operations are thread-safe and which are intentionally single-threaded.
When resources are created, gated, released, and considered invalid.
Node–Component Ownership¶
LifecycleComponentNode owns and drives every registered LifecycleComponent.
The relationship is one-to-many: one node, any number of named components.
LifecycleComponentNode
│
├── owns: List[LifecycleComponent]
│ (registered via add_component() before first transition)
│
└── drives: propagates on_configure / on_activate / on_deactivate
/ on_cleanup / on_shutdown / on_error
to each component in dependency-resolved order
Key ownership rules:
Components are registered by name via
add_component(component, *, dependencies=None, priority=None). Names must be unique.Registration is closed after the first lifecycle transition. Any subsequent call to
add_component()raisesRegistrationClosedError.Ordering metadata may be declared on the component constructor or at the registration site. Dependencies are resolved first,
prioritybreaks ties, and registration order remains the stable fallback.Each component holds a back-reference to its parent node (
component.node). Accessingnodebefore attachment raisesComponentNotAttachedError.configureandactivaterun in resolved order.deactivate,cleanup,shutdown, anderrorrun in the reverse of that resolved order.If one component returns
FAILUREorERROR, the node’s transition result reflects the worst outcome across all components.
Thread-safety of add_component¶
All accesses to the component registry and the registration gate are protected by an
internal threading.RLock (self._lock on the node). This means add_component
may be called safely from any thread, including threads that run before rclpy.spin
starts.
The registration gate (_registration_open) is written and read inside the same lock,
so a thread calling add_component concurrently with the first lifecycle transition
will either succeed (if _close_registration has not yet acquired the lock) or raise
RegistrationClosedError (if it has). There is no window where a component can be
partially registered.
Components registered before on_configure runs will have their hooks called during
the first transition. Components added after on_configure has been called raise
RegistrationClosedError regardless of which thread makes the call.
Note
The lock is reentrant (RLock), so application code that calls add_component
inside a node’s own __init__ is safe. Inside on_configure, registration remains
possible only before calling super().on_configure(state); after super() closes the
registration gate, add_component raises RegistrationClosedError.
Transition Sequence¶
The following sequence applies to every managed transition. The node is the single entry point into the transition; component hooks are called inside it.
ROS 2 executor
│
▼
LifecycleComponentNode.on_configure(state) ← rclpy calls this
│
├── closes registration (no more add_component after this)
│
├── for each component in resolved configure/activate order:
│ component.on_configure(state) ← @final; calls _guarded_call
│ └── _guarded_call(component._on_configure, state)
│ catches exceptions → TransitionCallbackReturn.ERROR
│ returns SUCCESS / FAILURE / ERROR
│
└── returns worst(results) to rclpy
The same pattern repeats for on_activate, on_deactivate, on_cleanup,
on_shutdown, and on_error. Each uses _guarded_call so exceptions from hook
code never escape to the rclpy executor. on_deactivate, on_cleanup,
on_shutdown, and on_error traverse the reverse of the resolved order.
Read the sequence as a lifecycle pipeline: node entry, registration gate, ordered hook calls,
result aggregation, then a single return to rclpy.
on_activate additionally sets component._is_active = True on each component
whose hook returned SUCCESS. on_deactivate clears it only on SUCCESS.
on_cleanup, on_shutdown, and on_error each clear _is_active = False
unconditionally before the hook runs, then call _release_resources
after the hook, regardless of the hook’s return value, and propagate the worst of the
two results. After that release attempt, component._needs_cleanup is cleared even
if resource release reported ERROR so the component returns to a reconfigurable
state instead of remaining cleanup-pending forever. Borrowed constructor inputs such
as callback_group are not part of _release_resources and remain application-owned.
Concurrency Contract¶
This section exists to keep the lifecycle readable under pressure. The library allows thread-safe registration before the first transition, but it does not normalize concurrent transition execution as a supported runtime pattern.
ADR — Threading model: single-threaded executor with thread-safe registration
Decision: lifecore_ros2 targets the ROS 2 SingleThreadedExecutor model.
Lifecycle transitions are driven sequentially by the ROS 2 executor and must not be
called concurrently from multiple threads. Component registration (add_component)
is additionally protected by an internal threading.RLock to allow calling from
any thread before the first transition starts.
Rationale:
SingleThreadedExecutor is the default for lifecycle nodes in ROS 2. Introducing
mutex-based protection around every transition would add overhead and complexity for
a scenario that is not part of standard ROS 2 usage. The existing RLock on the
registration gate already handles the common pre-spin setup pattern, where an
application may register components from a constructor or a setup thread before
calling rclpy.spin.
Consequences:
Application code that runs LifecycleComponentNode under a
MultiThreadedExecutor must not trigger lifecycle transitions concurrently. The
library enforces this with ConcurrentTransitionError, described below.
Thread-safety guarantees¶
Operation |
Thread-safe? |
Mechanism |
|---|---|---|
|
Yes |
|
|
Yes (raises) |
|
|
Yes |
|
Lifecycle transitions ( |
Single-thread only |
Relies on the ROS 2 executor; concurrent calls raise |
Component hook execution ( |
Single-thread only |
Called synchronously inside the transition; no cross-thread dispatch |
Forbidden concurrent transitions¶
Calling any lifecycle hook (on_configure, on_activate, on_deactivate,
on_cleanup, on_shutdown) while another transition is already running is a
programming error. The library detects this via an internal _in_transition flag
guarded by _lock and raises ConcurrentTransitionError:
Thread A: on_configure() ──────────────────────────────────►
Thread B: on_activate() ← raises ConcurrentTransitionError immediately
The flag is set atomically at the start of each hook entry point and cleared in a
finally block so it is always released, even if the transition fails.
Note
on_error is not guarded by _in_transition. rclpy calls on_error
as part of the error-recovery path after a failed transition. Guarding it would
interfere with normal error handling.
Reentrancy from callbacks¶
Lifecycle hooks are called synchronously by the ROS 2 executor. A component’s
_on_configure (or any other _on_* hook) must not call back into a lifecycle
transition on the same node — doing so would trigger ConcurrentTransitionError
because _in_transition is still set.
Calling add_component from within a lifecycle hook is safe (the RLock is
reentrant), but any component added after _close_registration has run will be
rejected with RegistrationClosedError. _close_registration is called at the
start of on_configure and on_shutdown.
Component destruction during active callbacks¶
The library does not manage component lifetime beyond the lifecycle transitions it drives. If application code destroys a component object while a subscription or timer callback is executing on that component, the result is undefined. The contract is:
Release component resources explicitly in
_on_cleanup/_on_shutdown.Do not hold external references to component objects beyond the node’s lifetime.
Do not destroy a node while it is still being spun. Call
rclpy.shutdown()or stop the executor before releasing the node object.
Topic-Resource Lifecycle¶
This is the resource contract that should shape how you read component code. If resource lifetime is unclear in an implementation, check it against this section first.
TopicComponent (and its subclasses LifecyclePublisherComponent and
LifecycleSubscriberComponent) follow a strict three-phase resource lifecycle:
configure activate deactivate
───────── ──────── ──────────
create publisher _is_active = True _is_active = False
create subscription start timers (app hook) stop timers (app hook)
store references enable message dispatch drop inbound messages
cleanup / shutdown / error
──────────────────────────
destroy publisher
destroy subscription
_release_resources() called automatically
The only state the library tracks is _is_active. There is no secondary resource-ready
flag. Whether a resource exists at runtime is determined entirely by whether _on_configure
has run and _on_cleanup / _release_resources has not yet been called.
Note
- The library does not own or start timers. The
start timersandstop timers entries above represent application code running inside
_on_activateand
_on_deactivate. The library has no built-in timer management.
Lifecycle Design¶
The repository follows native ROS 2 lifecycle semantics. LifecycleComponentNode registers each component as a managed entity and relies on the underlying lifecycle node behavior to propagate transitions.
LifecycleComponent remains intentionally small:
it is a managed entity
it knows its parent node
it exposes explicit lifecycle hooks
it avoids introducing a parallel hidden state machine
Topic Components¶
Topic-oriented components should follow these rules:
create ROS publishers and subscriptions during configure
gate publication or message handling with activation state
release ROS resources during cleanup
This keeps runtime behavior explicit and consistent with ROS 2 lifecycle expectations.
Lifecycle Invariants¶
The following invariants are binding for all LifecycleComponent subclasses.
- configure
Allocate ROS resources: create publishers, subscriptions, timers. Do not enable runtime behavior. Do not set
_is_active.- activate
Enable runtime behavior. Start publishing, accept message callbacks. Do not call
super()._on_activate(state)— the library sets_is_active = Trueautomatically after the hook returns SUCCESS. Do not allocate new ROS resources here.- deactivate
Disable runtime behavior. Stop publishing, ignore incoming messages.
_is_activeis cleared toFalseonly after_on_deactivatereturns SUCCESS. A FAILURE or ERROR result leaves_is_activeunchanged — the component stays active. Do not release ROS resources here — that is cleanup’s responsibility.- cleanup
Release all ROS resources allocated during configure.
_release_resources()is called automatically by the library after_on_cleanupreturns. No explicit call is needed in the override.- shutdown / error
_release_resources()is called automatically. No override needed for most subclasses.- No parallel lifecycle
No component may introduce an internal state machine that diverges from or shadows the node lifecycle.
_is_activeis the only lifecycle-adjacent flag. It is managed exclusively by the@finallibrary entry points:on_activatesets_is_active = Trueafter_on_activatereturnsSUCCESS.on_deactivateclears_is_active = Falseafter_on_deactivatereturnsSUCCESS.on_cleanup,on_shutdown, andon_erroreach clear_is_active = Falseunconditionally before the_on_*hook runs, regardless of its return value.
Subclasses must not read or write
_is_activedirectly. Do not callsuper()._on_activate()orsuper()._on_deactivate()to manage the flag — the library handles it.- Strict direct-call contract
LifecycleComponent.on_*remains library-owned. When a component hook entry point is called directly in an invalid order, lifecore_ros2 now raisesInvalidLifecycleTransitionErrorinstead of silently accepting the sequence.The node-driven path stays lifecycle-native: invalid node transitions are still rejected by the native rclpy state machine, and the library logs the attempted transition, current node state, and attached components before re-raising the native error. The extra component-side bookkeeping is limited to boundary checks for direct calls plus a cleanup-needed flag used to prevent direct reconfigure before resources are released. This is not an independent lifecycle controller.
Invalid transition handling¶ Invalid case
Node-level path
Direct
LifecycleComponent.on_*pathactivatebeforeconfigureNative rclpy rejection
InvalidLifecycleTransitionErrorrepeated
configurewithoutcleanupNative rclpy rejection
InvalidLifecycleTransitionErrorrepeated
activatewithoutdeactivateNative rclpy rejection
InvalidLifecycleTransitionErrordeactivatewithout prioractivateNative rclpy rejection
InvalidLifecycleTransitionErrorcleanupbeforeconfigureNative rclpy rejection
InvalidLifecycleTransitionErrorcleanupwhile activeNative rclpy rejection
InvalidLifecycleTransitionErrorEvery direct rejection logs the component name, attempted transition, current contract state, and rejection reason before raising.
- Configure failure rollback
A failed node-driven
configureno longer leaves half-configured components visible to the node. After a managedconfigurereturnsFAILUREorERROR,LifecycleComponentNodecalls_release_resources()on every attached component to restore a coherent unconfigured state before returning the final result.- Activation gating
LifecyclePublisherComponent.publish()raisesRuntimeErrorwhen inactive.LifecycleSubscriberComponentsilently drops incoming messages when inactive.LifecycleServiceClientComponent.call()andcall_async()raiseRuntimeErrorwhen inactive; in-flight futures are not cancelled on deactivate (the application owns them).LifecycleServiceServerComponentdoes not silently drop inactive requests: it logs a warning and returns a default-constructed response, populatingsuccess=False/message="component inactive"when those fields exist. All four behaviors are intentional and consistent with explicit activation semantics.
Naming Conventions¶
Library type names are stable and must not be changed or aliased.
Fixed names:
LifecycleComponent— the core reusable abstraction for a lifecycle-aware modular unit.LifecycleComponentNode— the library base node that owns and drives registered components.
Application node names must use domain/business names, not library names:
# Correct
class CameraNode(LifecycleComponentNode): ...
class NavigationNode(LifecycleComponentNode): ...
# Wrong — do not embed library terms in application class names
class LifecycleCameraNode(LifecycleComponentNode): ...
**Library-provided components** follow the pattern ``Lifecycle<Capability>Component``:
LifecyclePublisherComponentLifecycleSubscriberComponentLifecycleTimerComponentLifecycleServiceServerComponentLifecycleServiceClientComponent
Explicit rules:
No
Abstractprefix. UseBaseor no prefix if a base class is needed.No
*Manager,*Handler,*Coresynonyms without explicit review justification. These terms signal hidden complexity; prefer a descriptive name tied to one responsibility.No redundant qualifiers (
Impl,Mixin,Core) appended mechanically to a type name.
These rules are enforced in pull request review. Any new class that violates them must include an explicit justification in the PR description.
Error Policy¶
The library enforces one coherent error policy across four axes.
- Rule A — Boundary violations raise
Misuse of the public API by application code raises a typed subclass of
LifecoreError. All concrete subclasses also inherit from the matching standard Python exception for backward-compatibility.Exception
Standard parent
When raised
RegistrationClosedErrorRuntimeErroradd_componentcalled after the first lifecycle transitionDuplicateComponentErrorValueErroradd_componentcalled with a name that is already registeredComponentNotAttachedErrorRuntimeError.nodeaccessed on a component not attached to a nodeComponentNotConfiguredErrorRuntimeErrorpublish()called before_on_configurecreated the publisherCatch
LifecoreErrorto handle any library misuse in one place.- Rule B — Inside lifecycle hooks: never raise outward
_guarded_callwraps every_on_*hook invocation. Uncaught exceptions and invalid return values are both converted toTransitionCallbackReturn.ERRORwith a logged traceback. Hook authors choose:Return
FAILURE— transition fails, node stays in its current state.Return
ERRORor raise — transition fails, node entersErrorProcessing.Return
SUCCESS— transition proceeds.
The library never lets an exception escape from a lifecycle hook into rclpy.
- Rule C — Activation gating: outbound raises, inbound drops
Outbound calls initiated by application code (e.g.
publish()) raiseRuntimeErrorby default via@when_active. Application code can guard before calling; raising surfaces lifecycle programming errors early.Inbound callbacks driven by the middleware (subscription callbacks, timer callbacks) silently drop the message when the component is inactive. The drop is logged at
DEBUGlevel. Raising into the rclpy executor would crash the spin loop.Both defaults are configurable via
@when_active(when_not_active=...).Exceptions inside ``on_message`` are in a separate category: the message was delivered (the component was active), but the user’s
on_messageimplementation raised. These are caught by_on_message_wrapper, logged atERRORlevel with the exception type and message, and dropped. They never propagate to the executor. See the Handle on_message exceptions inside the method entry in Recommended Patterns and Anti-Patterns for guidance.Shared primitive: all activation checks across the library funnel through
require_active(is_active, *, component_name)inlifecore_ros2.core.activation_gating.LifecycleComponent.require_active()is a convenience façade over it.@when_activedefault-raise path delegates to the same primitive. Components with custom inactive policies (e.g.LifecycleServiceServerComponent._on_request_wrapper) callself.require_active()and catch the resultingRuntimeErrorto apply their policy. No rawif not self._is_active:check appears in component files outsideLifecycleComponentinternals.
Rule D — Error entry points and worst-result propagation
When does rclpy call ``on_error``? Any lifecycle transition that returns
ERROR(or whose hook raises an uncaught exception, which the library converts toERRORvia_guarded_call) moves the node into theErrorProcessingstate. rclpy then callson_erroron the node, which in turn calls each component’son_errorentry point.The return value of
on_errordetermines the next node state:
SUCCESS— node returns toUnconfigured(resources have been released; the node can be reconfigured).
FAILUREorERROR— node transitions toFinalized(terminal state; the process must be restarted to reuse the node).How the library handles the error entry point: For each component, the library’s
@final on_errorentry point:
Clears
_is_active = Falseunconditionally (before the hook runs).Calls
_on_errorvia_guarded_call(catches exceptions, converts toERROR).Calls
_release_resourcesregardless of the hook result.Returns the worst of the two results (
SUCCESS < FAILURE < ERROR).The same worst-result rule applies to
on_cleanupandon_shutdown:_on_cleanup,_on_shutdown, and_on_erroreach run the hook and then call_release_resources. A failing hook does not skip_release_resources.Difference between returning ``ERROR`` from a hook and overriding ``_on_error``: Returning
ERRORfrom any_on_*hook is how application code signals an unrecoverable transition failure — rclpy handles the state transition. Overriding_on_erroris how a component performs cleanup work after the node has already enteredErrorProcessing. Most components do not need to override_on_error; the default returnsSUCCESS, and_release_resourcesis called automatically.
Error Propagation Contract
The following table is the authoritative propagation matrix for all
_on_*hooks. See Error handling contract for the full rationale.
Hook outcome → library action¶ Event in hook
Wrapper return
rclpy next state
_on_error?
_release_resources?
SUCCESS
SUCCESStarget state
no
per transition
explicit
FAILURE
FAILUREprevious state
no
no (failed configure)
explicit
ERROR
ERROR
ErrorProcessingyes
yes
caught exception
ERROR+ log
ErrorProcessingyes
yes
invalid return value
ERROR+ log
ErrorProcessingyes
yes
Locked decisions (Sprint 2, ratified 2026-04-30):
Rollback policy B — all-or-nothing. A composite transition fails as soon as one component fails. The node returns
FAILURE; siblings that already transited are not externalised as partial. No reverse replay of_on_cleanuphooks.``LifecycleHookError`` wraps caught hook exceptions. The library creates a
LifecycleHookError(__cause__set to the original exception) for logging context. It is never propagated totrigger_*callers.Strict mode is the default and is non-configurable. Any
_on_*hook that returns a value outside{SUCCESS, FAILURE, ERROR}is logged atERRORand treated asERROR. There is no lenient mode.``_on_error`` is driven only by native rclpy ``ERROR_PROCESSING``. The library never synthesises an extra call to
_on_erroron caught exceptions. The native flow (exception → wrapper returnsERROR→ rclpy entersErrorProcessing→on_error→_release_resources) provides the full guarantee.
Member Convention¶
Every class in lifecore_ros2 assigns each method and attribute to exactly one
of four buckets. This is the authoritative guide for contributors and subclassers.
- Bucket 1 — Public API
Stable surface for direct use by application code. No leading underscore. Included in
__all__at module level. Examples:name,is_active,add_component,publish,on_message.- Bucket 2 — Protected extension points
Override in subclasses; never call directly from application code. Single leading underscore. Docstring starts with
Extension point.Examples:_on_configure,_on_activate,_on_deactivate,_on_cleanup,_on_shutdown,_on_error,_release_resources. Rendered in API docs.- Bucket 3 — Library-controlled entry points
Implement the
rclpyManagedEntity/LifecycleNodeprotocol. Decorated with@typing.finalonLifecycleComponentso pyright catches accidental overrides. OnLifecycleComponentNode,on_configureandon_shutdownare not sealed because application nodes legitimately callsuper()inside them; those carry an explicit “override with super” contract in their docstring. Examples:LifecycleComponent.on_configure,on_activate,on_deactivate,on_cleanup,on_shutdown,on_error.- Bucket 4 — Library-internal
Implementation details with no user contract. Single leading underscore. Docstring starts with
Library-internal. Do not call from user code.Excluded from API docs. Examples:_attach,_detach,_guarded_call,_safe_release_resources,_resolve_logger,_close_registration,_on_message_wrapper.
When adding a new method, assign it to one bucket before writing the docstring.