Access and manage embedded clusters (Beta)

This topic describes managing nodes in clusters created with Replicated Embedded Cluster.

Access the cluster

You can use the CLI to access the cluster. This is useful for development or troubleshooting.

To access the cluster and use other included binaries:

SSH into a controller node.

note
You cannot run the shell command on worker nodes.

Use the Embedded Cluster shell command to start a shell with access to the cluster:

sudo ./APP_SLUG shell

Where APP_SLUG is the unique slug for the application.

The output looks similar to the following:

   __4___
_  \ \ \ \   Welcome to APP_SLUG debug shell.
<'\ /_/_/_/   This terminal is now configured to access your cluster.
((____!___/) Type 'exit' (or CTRL+d) to exit.
 \0\0\0\0\/  Happy hacking.
~~~~~~~~~~~
root@alex-ec-1:/home/alex# export KUBECONFIG="/var/lib/embedded-cluster/k0s/pki/admin.conf"
root@alex-ec-1:/home/alex# export PATH="$PATH:/var/lib/embedded-cluster/bin"
root@alex-ec-1:/home/alex# source <(k0s completion bash)
root@alex-ec-1:/home/alex# source <(cat /var/lib/embedded-cluster/bin/kubectl_completion_bash.sh)
root@alex-ec-1:/home/alex# source /etc/bash_completion

The appropriate kubeconfig is exported, and the location of useful binaries like kubectl and Replicated’s preflight and support-bundle plugins is added to PATH.

Use the available binaries as needed.

Example:

kubectl version

Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.1+k0s

Type exit or Ctrl + D to exit the shell.

Configure multi-node clusters

This section describes how to join nodes to a cluster with Embedded Cluster.

Limitations

Multi-node clusters with Embedded Cluster have the following limitations:

All nodes joined to the cluster use the same Embedded Cluster data directory as the installation node. You cannot choose a different data directory for Embedded Cluster when joining nodes.
You should not join more than one controller node at the same time. When joining a controller node, Embedded Cluster prints a warning explaining that you should not attempt to join another node until the controller node joins successfully.
You cannot change a node's role assignment after you join the node. To change a node's roles, reset the node and add it again with the new role selection. For more information about customizing node roles, see roles in Embedded Cluster Config.

Requirement

To deploy multi-node clusters with Embedded Cluster, you must enable the Multi-node Cluster (Embedded Cluster only) license field for the customer. For more information about managing customer licenses, see Create and Manage Customers.

Join nodes

To join a node:

SSH into a controller node.
Run the following command to generate the .tar.gz bundle for joining a node:
```
sudo ./APP_SLUG create-join-bundle --roles ROLE[,ROLE...]
```
Where:
- APP_SLUG is the unique slug for the application.
- --roles is one or more role names to assign the node, comma-separated. Valid role names are the default controller role and any custom roles declared in the Embedded Cluster Config under roles.
  
  note
  You cannot change role assignments after a node is joined. To change them, reset the node and add it again with the new role selection.
Use scp to copy the .tar.gz bundle to the node that you want to join.
Extract the .tar.gz.
Run the join command to add the node to the cluster:
```
sudo ./APP_SLUG node join
```
Repeat these steps for each node you want to add.

Headless joins

For automation and orchestrated deployments, you can join nodes without going through the web UI by calling the Embedded Cluster external API from any host in the cluster network. The flow is:

Fetch a JWT token from a controller using the installer password:

TOKEN=$(curl -X POST https://<controller-ip>:30081/v1/auth/token \
  -H 'Content-Type: application/json' \
  -d '{"password":"<installer-password>"}' | jq -r .token)

Poll GET /v1/nodes until at least one controller reports ready: true. This is the signal that the cluster is ready to issue join commands:

until curl -s https://<controller-ip>:30081/v1/nodes \
  -H "Authorization: Bearer $TOKEN" \
  | jq -e '.nodes | any(.ready)' >/dev/null; do
  sleep 5
done

Request the join commands. Pass the desired role names from the roles key in the Embedded Cluster Config:

RESPONSE=$(curl -X POST https://<controller-ip>:30081/v1/create-join-command \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"roles":["controller"]}')

Execute each command in the returned joinSteps array on the joining node:

echo "$RESPONSE" | jq -r '.joinSteps[].command' | while IFS= read -r cmd; do
  eval "$cmd"
done

For the full set of endpoints and response schemas, see External API.

High availability for multi-node clusters

Embedded Cluster automatically enables high availability (HA) when at least three controller nodes are present in the cluster.

In HA installations, Embedded Cluster deploys multiple replicas of the OpenEBS and image registry built-in extensions. Also, any Helm extensions that you include in the Embedded Cluster Config are installed in the cluster depending on the given chart and whether or not it is configured to be deployed with high availability.

Best practices for high availability

Consider the following best practices and recommendations for HA clusters:

HA requires at least three controller nodes that run the Kubernetes control plane. This is because clusters use a quorum system, in which more than half the nodes must be up and reachable. In clusters with three controller nodes, the Kubernetes control plane can continue to operate if one node fails because the remaining two nodes can still form a quorum.
Always use an odd number of controller nodes in HA clusters. Using an odd number of controller nodes ensures that the cluster can make decisions efficiently with quorum calculations. Clusters with an odd number of controller nodes also avoid split-brain scenarios where the cluster runs as two, independent groups of nodes, resulting in inconsistencies and conflicts.
You can have any number of worker nodes in HA clusters. Worker nodes do not run the Kubernetes control plane, but can run workloads such as application workloads.

Create a multi-node cluster with HA

To create a multi-node cluster with HA:

During installation with Embedded Cluster, follow the steps in the Embedded Cluster UI to join a total of three controller nodes to the cluster. For more information about joining nodes, see Join nodes on this page.

Embedded Cluster automatically converts the installation to HA when three or more controller nodes are present.

Enable HA for an existing cluster

To enable HA for an existing Embedded Cluster installation with three or more controller nodes:

On one of the controller nodes, run this command:
```
sudo ./APP_SLUG enable-ha
```
Where APP_SLUG is the unique slug for the application.

Reboot nodes

Embedded Cluster runs Kubernetes (k0s) as a systemd service, so a node automatically rejoins the cluster after a reboot. No special preparation is required — you do not need to cordon or drain the node before rebooting.

Single-node clusters: The application is unavailable while the node is rebooting. Schedule reboots during a maintenance window, and expect a short outage (typically a few minutes) while the node restarts and pods are rescheduled.

Multi-node clusters: Reboot one node at a time. Wait for the rebooted node to fully rejoin the cluster and report Ready before rebooting the next node. Do not reboot two or more controller nodes at the same time — losing multiple controllers simultaneously can break etcd quorum and take the entire cluster offline.

To check whether a node has rejoined after a reboot, run the following command from a controller node:

sudo ./APP_SLUG shell -c "kubectl get nodes"

Where APP_SLUG is the unique slug for the application. All nodes should show Ready status before proceeding with the next reboot.

Reset nodes and remove clusters

This section describes how to reset individual nodes and how to delete an entire multi-node cluster using the Embedded Cluster reset command.

About the `reset` command

Resetting a node with Embedded Cluster removes the cluster and your application from that node. This is useful for iteration, development, and when you make mistakes because you can reuse the machine instead of having to procure a new one.

The reset command performs the following steps:

Run safety checks. For example, reset does not remove a controller node when there are workers nodes available. And, it does not remove a node when the etcd cluster is unhealthy.
Drain the node and evict all the Pods gracefully
Delete the node from the cluster
Stop and reset k0s
Remove all Embedded Cluster files
Reboot the node

For more information about the command, see reset.

Limitations and best practices

Before you reset a node or remove a cluster, consider the following limitations and best practices:

When you reset a node, Embedded Cluster deletes OpenEBS PVCs on that node. Kubernetes automatically recreates only PVCs created as part of a StatefulSet on another node in the cluster. To recreate other PVCs, redeploy the application in the cluster.
If you need to reset one controller node in a three-node cluster, first join a fourth controller node to the cluster before removing the target node. This ensures that you maintain a minimum of three nodes for the Kubernetes control plane. You can add and remove worker nodes as needed because they do not have any control plane components.
When resetting a single node or deleting a test environment, you can include the --force flag with the reset command to ignore any errors.
When removing a multi-node cluster, run reset on each of the worker nodes first. Then, run reset on controller nodes. Controller nodes also remove themselves from etcd membership.

Reset a node

To reset a node:

SSH onto the node. Ensure that the Embedded Cluster binary is still available on the machine.
Run the following command to remove the node and reboot the machine:
```
sudo ./APP_SLUG reset
```
Where APP_SLUG is the unique slug for the application.

Remove a multi-node cluster

To remove a multi-node cluster:

SSH onto a worker node.

note
The safety checks for the reset command prevent you from removing a controller node when there are still worker nodes available in the cluster.
Remove the node and reboot the machine:
```
sudo ./APP_SLUG reset
```
Where APP_SLUG is the unique slug for the application.
After removing all the worker nodes in the cluster, SSH onto a controller node and run the reset command to remove the node.
Repeat the previous step on the remaining controller nodes in the cluster.

Access the cluster​

Configure multi-node clusters​

Limitations​

Requirement​

Join nodes​

Headless joins​

High availability for multi-node clusters​

Best practices for high availability​

Create a multi-node cluster with HA​

Enable HA for an existing cluster​

Reboot nodes​

Reset nodes and remove clusters​

About the reset command​

Limitations and best practices​

Reset a node​

Remove a multi-node cluster​

On this page