FAQ
TOC
How to use Kubeflow plugins when setting PSA=restricted in KubernetesHow to Configure Kubeflow to Use an Alternative Platform Address for Login?How to start a Kubeflow Pipeline Run with external S3/MinIO storageConfigure Kubeflow Notebook to use custom GPU resourcesPod Startup Failure: Probe Timeout (kube-ovn Environment)How to use Kubeflow plugins when setting PSA=restricted in Kubernetes
If your namespace have PSA=restricted, you may encounter errors when using Kubeflow components like when you create notebooks, kubeflow pipeline runs etc. To solve that, you need to change the default PSA to baseline for the current namespace:
NOTE: You may need to consult your cluster admin to make sure changing the PSA is acceptable.
How to Configure Kubeflow to Use an Alternative Platform Address for Login?
In some environments, the platform access address is configured as an internal network address, and users need to log in through an "Alternative Platform Address." In this scenario, while the OIDC issuer remains based on the original platform address, the login page URL must be updated to the alternative address.
Steps:
- Locate the ModuleInfo Resource:
In the global cluster, find the ModuleInfo resource corresponding to the kfbase plugin using the following command:
- Edit the ModuleInfo Resource
Add the valuesOverride section under spec as shown below. Replace <Alternative-Platform-Address> with the actual alternative address.
- Restart the OAuth2 Proxy:
Apply the changes by restart the oauth2-proxy deployment in the target cluster:
How to start a Kubeflow Pipeline Run with external S3/MinIO storage
When you installed Kubeflow with an external S3/MinIO storage service, you need to add a "KFP Launcher" configmap to setup storage used by current namespace or user. You can checkout Kubeflow document https://www.kubeflow.org/docs/components/pipelines/operator-guides/configure-object-store/#s3-and-s3-compatible-provider for more details. If no configuation is set, the pipeline runs may still accessing the default service address like "minio-service.kubeflow:9000 " which is not correct.
Below is a simple sample for you to start:
For example, you should setup below values in this configmap to point to your own S3/MinIO storage
- defaultPipelineRoot: where to store the pipeline intermediate data
- endpoint: s3/MinIO service endpoint. Note, should NOT start with "http" or "https"
- disableSSL: whether disable "https" access to the endpoint
- region: s3 region. If using MinIO, any value will be fine
- credentials: AK/SK in the secrets
After add this configmap, the newly started Kubeflow Pipeline Runs will automatically read this configration, and save stuff that is used by Kubeflow Pipeline.
Configure Kubeflow Notebook to use custom GPU resources
You can add other GPU resouce types so that Kubeflow Notebook web page can create instances leveraging these hardware, e.g. when using Ascend GPUs.
Edit the configmap by running this command:
Find below section and add your GPU resource types like "your-custom.com/gpu".
NOTE, you can only add resource types using integer values, like 1,2,4,8. Also, you can not add "Virtual" or "Shared" GPU resources using both "Cores" and "Memory" like when you are using HAMi.
Pod Startup Failure: Probe Timeout (kube-ovn Environment)
Symptoms: A large number of Pods in the kubeflow namespace are stuck in CrashLoopBackOff or Init:1/2, and Pod Events show errors such as:
Cause: The default-allow-same-namespace NetworkPolicy deployed by kfbase only allows ingress traffic from Pods in the same namespace and a small number of system namespaces. In clusters using kube-ovn as the CNI, health probe traffic sent by kubelet reaches Pods through the kube-ovn join subnet (default 100.64.0.0/16). The source IP of that traffic does not match any existing NetworkPolicy rule, so it is dropped by the OVN ACL, causing all probes to time out.
Fix: Create a NetworkPolicy that allows inbound traffic from the kube-ovn join subnet:
Note: The join subnet CIDR may differ across clusters. Always get the actual value by running kubectl get subnet join. A common default is 100.64.0.0/16.