Customize Support Bundles for Helm Installations
This topic provides a basic understanding and some key considerations about support bundle specifications for Helm installations to help guide you in defining them for your application.
About Support Bundles
Customizing a support bundle is unique to your application and depends on what kind of data you need to troubleshoot. Replicated recommends designing comprehensive support bundle specifications to give you the most insight to problems in your customer environments and the help reduce your support burden.
For a comprehensive overview, see About Preflight Checks and Support Bundles.
For more information about specifications, see About Specifications in About Preflight Checks and Support Bundles.
Choose an Input Kind
You can create support bundle specifications using the following kinds:
- Secret (
kind: Secret
) - SupportBundle custom resource (
kind: SupportBundle
)
Create a Secret (Recommended)
Replicated recommends using Secrets to contain a support bundle specifications in your Helm chart. This method allows customers to automatically discover and generate a support bundle without specifying a long URL. Using Secrets also allows specifications to be templated using information in the values.yaml
file.
Alternatively, you can use a ConfigMap (kind: ConfigMap
) if the specification will not collect private information from the cluster.
To create a Secret for the support bundle specification:
Create a Secret as a YAML file with
kind: Secret
andapiVersion: v1
. The Secret must include the following:- The label
troubleshoot.sh/kind: support-bundle
- A
stringData
field with a key namedsupport-bundle-spec
Template:
apiVersion: v1
kind: Secret
metadata:
labels:
troubleshoot.sh/kind: support-bundle
name: {{ .Release.Name }}-support-bundle
stringData:
# This is the support bundle spec that is used to generate the support bundle.
# Notes: You can use {{ .Release.Namespace }} to ensure that the support bundle
# is scoped to the release namespace.
# You can use any of Helm's templating features here, including {{ .Values.someValue }}
support-bundle-spec: |
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: support-bundle
spec:
collectors: []
analyzers: []- The label
Add the Secret to your Helm chart
templates/
directory.
Next, define the support bundle specification by adding collectors and analyzers. For more information, see Define the Support Bundle Specification.
Create a SupportBundle Custom Resource
If you do not want to use Secrets, you can create a SupportBundle custom resource instead. Helm templates are supported when the specification is distributed using an OCI registry.
Create a SupportBundle custom resource (kind: SupportBundle
) using the following basic support bundle template. For more information about this custom resource, see Preflight and Support Bundle.
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: collectors
spec:
collectors: []
analyzers: []
Next, define the support bundle specification by adding collectors and analyzers. For more information, see Define the Support Bundle Specification.
Define the Support Bundle Specification
Defining support bundle specifications depends on your application's needs. This section gives some guidance about how to think about using collectors and analyzers to design support bundles.
For more information about defining collectors and analyzers, see Collecting Data and Analyzing Data in the Troubleshoot documentation.
For more information about using Helm templates with collectors and analyzers, see Using Helm Templates in Specifications.
Cluster Collectors
You can add the clusterInfo
and clusterResources
collectors if you want to collect a large amount of data to help with installation and debugging, such as the Kubernetes version, nodes, and storage classes in the cluster. For clusterResources
, you can use the default namespace and add additional namespaces using the namespaces
field.
The following example shows clusterInfo
and clusterResources
collectors with an additional namespace specified for the application.
spec:
collectors:
- clusterInfo: []
- clusterResources:
namespaces:
- default
- my-app-namespace
For more information about defining collectors and analyzers, see Collecting Data and Analyzing Data in the Troubleshoot documentation.
Pod Log Collectors
Replicated recommends adding application Pod logs and set the collection limits for the number of lines logged. Typically the selector attribute is matched to the labels.
To get the labels for an application, either inspect the YAML or run kubectl get pods --show-labels
.
After the labels are discovered, create collectors to include logs from these pods in a bundle. Depending on the complexity of an application's labeling schema, you might need a few different declarations of the logs collector. You can include the logs
collector as many times as needed.
The limits
field can support maxAge
or maxLines
. This limits the output to the constraints provided. Default: maxLines: 10000
The following example shows a Pod log collector for nginx scoped to a specific namespace using {{ .Release.Namespace }}
:
spec:
collectors:
- clusterInfo: []
- clusterResources: []
- logs:
selector:
- app=someapp
- component=nginx
namespace: {{ .Release.Namespace }}
limits:
maxAge: 720h # 30*24
maxLines: 10000
maxBytes: 5000000
For more information about defining collectors and analyzers, see Collecting Data and Analyzing Data in the Troubleshoot documentation.
Other Recommended Collectors
Add any custom collectors to the file. Collectors that Replicated recommends considering are:
- Kubernetes resources: Use for custom resource definitions (CRDs), Secrets, and ConfigMaps, if they are required for your application to work.
- Databases: Return a selection of rows or entire tables.
- Volumes: Ensure that an application's persistent state files exist, are readable/writeable, and have the right permissions.
- Pods: Run a Pod from a custom image.
- Files: Copy files from Pods and hosts.
- HTTP: Consume your own application APIs with HTTP requests. If your application has its own API that serves status, metrics, performance data, and so on, this information can be collected and analyzed.
Analyzers
Add analyzers based on conditions that you expect for your application. For example, you might require that a cluster have at least 2 CPUs and 4GB memory available.
Good analyzers clearly identify failure modes. For example, if you can identify a log message from your database component that indicates a problem, you should write an analyzer that checks for that log.
At a minimum, include application log analyzers. A simple text analyzer can detect specific log lines and inform an end user of remediation steps.
Analyzers that Replicated recommends considering are:
- Resource statuses: Check the status of various resources, such as Deployments, StatefulSets, Jobs, and so on.
- Regular expressions: Analyze arbitrary data.
- Databases: Check the version and connection status.
The following example shows a deployment status collector with messages that can inform the customer as to the cause of an issue if the deployment fails. You can customize the message to help guide customers to fix the issue on their own.
spec:
collectors: []
analyzers:
- deploymentStatus:
name: api
namespace: default
outcomes:
- fail:
when: "< 1"
message: The API deployment does not have any ready replicas.
- warn:
when: "= 1"
message: The API deployment has only a single ready replica.
- pass:
message: There are multiple replicas of the API deployment ready.
For more information about defining collectors and analyzers, see Collecting Data and Analyzing Data in the Troubleshoot documentation.
Example
The following example shows a support bundle specification as a Secret. For more examples, see the Troubleshoot example repository in GitHub.
For more information about defining collectors and analyzers, see Collecting Data and Analyzing Data in the Troubleshoot documentation.
apiVersion: v1
kind: Secret
metadata:
labels:
troubleshoot.sh/kind: support-bundle
name: {{ .Release.Name }}-support-bundle
namespace: {{ .Release.Namespace }}
type: Opaque
stringData:
# This is the support bundle spec that will be used to generate the support bundle
# Notes: we use {{ .Release.Namespace }} to ensure that the support bundle is scoped to the release namespace
# We can use any of Helm's templating features here, including {{ .Values.someValue }}
support-bundle-spec: |
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: support-bundle
spec:
collectors:
- clusterInfo: {}
- clusterResources: {}
- logs:
selector:
- app=someapp
- component=nginx
namespace: {{ .Release.Namespace }}
limits:
maxAge: 720h # 30*24
maxLines: 10000
maxBytes: 5000000
- logs:
collectorName: all-logs
name: all-logs
- runPod:
collectorName: "static-hi"
podSpec:
containers:
- name: static-hi
image: alpine:3
command: ["echo", "hi static!"]
analyzers:
- clusterVersion:
outcomes:
- fail:
when: "< 1.22.0"
message: The application requires at least Kubernetes 1.22.0 or later
uri: https://kubernetes.io
- warn:
when: "< 1.23.0"
message: Your cluster meets the minimum version of Kubernetes, but we recommend you update to 1.23.0 or later.
uri: https://kubernetes.io
- pass:
message: Your cluster meets the recommended and required versions of Kubernetes.
- nodeResources:
checkName: Must have at least 3 nodes in the cluster
outcomes:
- fail:
when: "count() < 3"
message: This application requires at least 3 nodes
- warn:
when: "count() < 5"
message: This application recommends at last 5 nodes.
- pass:
message: This cluster has enough nodes.
- nodeResources:
checkName: Total CPU Cores in the cluster is 4 or greater
outcomes:
- fail:
when: "sum(cpuCapacity) < 4"
message: The cluster must contain at least 4 cores
- pass:
message: There are at least 4 cores in the cluster
- nodeResources:
checkName: Each node must have at least 40 GB of ephemeral storage
outcomes:
- fail:
when: "min(ephemeralStorageCapacity) < 40Gi"
message: Nodes in this cluster do not have at least 40 GB of ephemeral storage.
uri: https://domain.com/docs/system-requirements
- warn:
when: "min(ephemeralStorageCapacity) < 100Gi"
message: Nodes in this cluster are recommended to have at least 100 GB of ephemeral storage.
uri: https://domain.com/docs/system-requirements
- pass:
message: The nodes in this cluster have enough ephemeral storage.
- ingress:
namespace: default
ingressName: connect-to-me
outcomes:
- fail:
message: The ingress isn't ingressing
- pass:
message: All systems ok on ingress
- deploymentStatus:
name: api
namespace: default
outcomes:
- fail:
when: "< 1"
message: The API deployment does not have any ready replicas.
- warn:
when: "= 1"
message: The API deployment has only a single ready replica.
- pass:
message: There are multiple replicas of the API deployment ready.
- textAnalyze:
checkName: Said hi!
fileName: /static-hi.log
regex: 'hi static'
outcomes:
- fail:
message: Didn't say hi.
- pass:
message: Said hi!
Next Step
Test your support bundle in a development environment. For more information, see Generating Support Bundles.