TROUBLESHOOTING
Introduction
This topic describes a few troubleshooting scenarios that you could face during installing and configuring KubeSlice.
Installation Issues
Why did the download of the KubeSlice Cert Manager fail?
During the installation of the KubeSlice cert-manager, if you get the following error message.
failed to download "kubeslice/cert-manager" (hint: running helm repo update may help)
Check the helm version. If the helm version is older than version 3.7.0,
then you cannot download the kubeslice/cert-manager
. Upgrade the helm
version to 3.7.0 to successfully install the cert-manager.
Why do I face installation issues while installing KubeSlice on kind clusters on Ubuntu OS?
On Ubuntu OS, if you have too many files open and try installing KubeSlice on kind clusters, then you face issues.
You must increase the ulimit
to 2048
or unlimited
and try installing KubeSlice on kind
clusters. If you still face issues, see errors due to too many open files.
Why do I get an error during a helm upgrade of the KubeSlice Controller?
caution
Currently, you can only upgrade to a software patch version that does not contain schema changes. You cannot upgrade to a software patch/complete version that contains schema changes.
When you try to upgrade the controller using the helm upgrade
command,
during which, if the worker operator pod is down, you get the following error related
to a mutating webhook.
Patch Deployment "kubeslice-controller-manager" in namespace kubeslice-controller
error updating the resource "kubeslice-controller-manager":
cannot patch "kubeslice-controller-manager" with kind Deployment: Internal error occurred: failed calling webhook "mdeploy.avesha.io": failed to call webhook: Post "https://kubeslice-webhook-service.kubeslice-system.svc:443/mutate-appsv1-deploy?timeout=10s": no endpoints available for service "kubeslice-webhook-service"
Looks like there are no changes for Deployment "kubernetes-dashboard"
Looks like there are no changes for Deployment "dashboard-metrics-scraper"
Patch Certificate "kubeslice-controller-serving-cert" in namespace kubeslice-controller
Patch Issuer "kubeslice-controller-selfsigned-issuer" in namespace kubeslice-controller
Patch MutatingWebhookConfiguration "kubeslice-controller-mutating-webhook-configuration" in namespace
Patch ValidatingWebhookConfiguration "kubeslice-controller-validating-webhook-configuration" in namespace
Error: UPGRADE FAILED: cannot patch "kubeslice-controller-manager" with kind Deployment: Internal error occurred: failed calling webhook "mdeploy.avesha.io": failed to call webhook: Post "https://kubeslice-webhook-service.kubeslice-system.svc:443/mutate-appsv1-deploy?timeout=10s": no endpoints available for service "kubeslice-webhook-service"
To resolve this error, manually delete the mutating webhook configuration as described below:
Get the name of the
MutatingWebhookConfiguration
webhook using the following command:kubectl get mutatingwebhookconfiguration
Expected Output
NAME WEBHOOKS AGE
cdi-api-datavolume-mutate 1 16d
cert-manager-webhook 1 31d
istio-sidecar-injector 4 15d
kubeslice-controller-mutating-webhook-configuration 7 30d
kubeslice-mutating-webhook-configuration 1 29d
longhorn-webhook-mutator 1 17d
nsm-admission-webhook-cfg 1 29d
virt-api-mutator 4 18dNote down the name of the
MutatingWebhookConfiguration
webhook, which iskubeslice-mutating-webhook-configuration
in the above output.Delete the
MutatingWebhookConfiguration
using the following command:kubectl delete mutatingwebhookconfiguration kubeslice-mutating-webhook-configuration
Connectivity Issues
Why is my registered cluster not connected to the KubeSlice Controller?
There could be an issue during the installation of the Slice Operator on the registered cluster. Try these steps:
Switch context to the registered cluster on which you are facing connectivity issues using the following command:
kubectx <cluster name>
Validate the installation of the Slice Operator by checking the pods belonging to the namespace
kubeslice-controller-system
using the following command (from the output, check the status of the pods):kubectl get pods -n kubeslice-controller-system
If the connection issue still persists, check if the KubeSlice Controller endpoint and token in the cluster are correct in the Slice Operator YAML configuration file that is applied in that registered cluster. To know about the configuration, see the Slice Operator YAML file. file.
Registering clusters with the same name does not throw an error.
Each instance of the cluster is registered separately as two different clusters and Kubernetes ignores duplication of the cluster's name.
It is best to avoid the duplication of the clusters names as Kubernetes inherently ignores the duplication of cluster names.
The KubeSlice Controller was successfully installed with a controller endpoint that is not reachable by a slice.
Check if the controller endpoint is correct during the installation of the Slice Operator on the worker cluster. Check if the controller cluster's secret token and ca-cert installed on the worker cluster is correct. To know more, see Getting the Secrets of the Registered Cluster.
Node IP address on the registered cluster was changed but the KubeSlice components were not cleaned up.
When the Node IP address is changed on a registered cluster, then a manual clean-up is required for the worker cluster configuration to use the updated IP. So, we recommend not to change the Node IP manually when it is already configured or add an invalid Node IP address.
While registering a cluster, the Node IP is configured by pulling the value from the cluster.
A cluster registration failed with a correct cluster.yaml file.
The registration fails when a cluster.yaml
file is applied to register more than one
clusters.
Ensure that acluster.yaml
file is applied to only one cluster and not multiple clusters.
Cluster Issues
The error/warning states that the CRD object is stuck.
Patch an empty finalizer with the failing object CRD warning as shown in this example.
(
serviceexportconfigs.hub.kubeslice.io
is a failing CRD object in this example.)kubectl patch crd/serviceexportconfigs.hub.kubeslice.io -p '{"metadata":{"finalizers":[]}}' --type=merge
Uninstall and reinstall the KubeSlice Controller.
The error states that the project namespace is stuck.
Delete the stuck namespace by running the following command:
kubectl patch ns/<stuck-namespace> -p '{"metadata":{"finalizers":[]}}' --type=merge
Uninstall and reinstall the KubeSlice Controller.
Onboarded Application Namespace Issues
NSM containers are not injected in pods during deployments in the application namespace.
If NSM containers are not injected in pods during deployments in the application namespace, then check if that application namespace contains the KubeSlice label. If the label is not there, wait for the Slice Operator to label the namespace.
For example, run the following command to check the label:
kubectl describe ns iperf
In the command output below, kubeslice.io/slice=blue
is the KubeSlice label.
Name: iperf
Labels: hnc.x-k8s.io/included-namespace=true
iperf.tree.hnc.x-k8s.io/depth=0
kubernetes.io/metadata.name=iperf
kubeslice.io/slice=blue
Annotations: <none>
Status: Active
No resource quota.
No LimitRange resource.
In the command output, the iperf namespace contains the kubeslice.io/slice=blue
label.
This means that the namespace is already onboarded to the blue slice.
kubeslice-cli
This guide describes troubleshooting scenarios that you could face during installing and while using the kubeSlice-cli
tool.
Unable to Install Kubeslice using the kubeslice-cli Tool on Ubuntu
During the installation of KubeSlice using the kubeslice-cli install -p=minimal-demo
command, if you get the following error message:
✓ Writing configuration 📜
• Starting control-plane 🕹️ ...
✗ Starting control-plane 🕹️
ERROR: failed to create cluster: failed to init node with kubeadm: command "docker exec --privileged ks-w-2-control-plane kubeadm init --skip-phases=preflight --config=/kind/kubeadm.conf --skip-token-print --v=6" failed with error: exit status 137
Command Output:
2022/10/04 06:12:21 Process failed exit status 1
There could be a memory/disk space issue.
To resolve:
- Remove unused clusters (other than the ones used in the demo).
- Increase disk space/memory resources.
Unable to run the kubeslice-cli commands
After successfully installing KubeSlice using kubeslice-cli, if you are unable to use the commands:
kubeslice-cli get sliceConfig -n kubeslice-demo
Fetching KubeSlice sliceConfig...
🏃 Running command: /usr/local/bin/kubectl get sliceconfigs.controller.kubeslice.io -n demo
error: the server doesn't have a resource type "sliceconfigs"
2022/10/04 08:26:40 Process failed exit status 1
To resolve:
- Ensure you are on the controller cluster to run the commands:
kubectx -c
. - Export the configuration file using this command:
export KUBECONFIG=kubeslice/<path-to-the-kubeconfig-file>
.
Getting an Unverified Developer Error Message on macOS
When you try to install kubeslice-cli on macOS, you get the Unverified Developer Error Message
.
This error message appears when you try to install an application from a developer who is not registered with Apple.
To resolve:
Follow the instructions in Enabling the Application for macOS.