Managed OpenShift on vSphere

Supported Features and Configuration

While this page describes the product features, a more detailed technical insight into how we operate OpenShift can be found in our knowledge base.

Red Hat provides a document with OpenShift Container Platform 4.x Tested Integrations (for x86_x64) which applies also to this product.

Supported by default

These features and configurations are available out-of-the box and installed and configured by default.

Feature / Configuration Description

Authentication

Authentication is by default via VSHN Account (LDAP). User management is done via VSHN Portal (Self-Service).

Network Policy

Network policies are supported. By default, namespaces are configured to allow incoming trafic only from within the same namespace, from ingress and from monitoring. The user docs explain how you can customize this.

Integrated registry

The integrated registry is installed and enabled by default. It uses the cloud providers object storage to store images.

Machine Config (Compute nodes)

A set of default Machine configuration is available (see infrastructure specifics).

Operator Hub

The Operator Hub is enabled by default, no support is given on any Operators installed via the Operator Hub.

Image Builds

Building images on the platform is supported and enabled by default. Please note that using the Docker build strategy isn’t secure as it exposes the host system to root privilege escalation.

Cluster Monitoring

Cluster monitoring is enabled and used to ensure and assess cluster stability. Alerts are sent to VSHN and handled accordingly. Alert rules are tweaked regularly by VSHN.

OpenShift Central Logging

The integrated central logging based on Loki is installed and configured by default.

Cluster metrics are further used to monitor the resource usage and resource availability on the whole cluster.

User workload monitoring:: The same technology stack used for cluster monitoring is also available to monitor workloads deployed by cluster users. Users can create service monitors, alert rules and alert routes to monitor their applications. They can also use the OpenShift web console or a Grafana dashboard to inspect the metrics of their applications over time.

Cluster limits

We adhere to the official numbers which are documented under Planning your environment according to object maximums. Although the following limits are set by VSHN:

Infrastructure Nodes

Each cluster has at least 3 nodes dedicated to OpenShift infrastructure components like router, registry, web console and monitoring components. No user workload is allowed on these nodes.

OpenShift Cluster Maintenance

OpenShift and node updates are applied continuously when they’re available. See also Version and Upgrade Policy.

OpenShift Cluster Backup

A full backup of the etcd database is made every 4 hours. Additionally, a second backup contains a dump of all objects in JSON format. This allows single objects to be restored on request. The backup data is encrypted before it is stored in an object storage backend, usually on the same cloud as the cluster is running. K8up is used as the backup operator, using Restic as backup backend.

IMPORTANT: Please understand, this does not replace any sort of per application backup strategy. It also does not protect against failure of the underlying infrastructure and can not be used for disaster recovery purposes.

Persistent storage volumes are not automatically backed up. The user of persistent volumes is obliged to take care of this. For that purpose, K8up is available on the cluster to help with that task. We’re also happy to help, just let us know.

Supported on request

These features or configuration adjustments must be specifically requested and some restrictions apply. Activation and configuration of these features imply additional engineering costs and can cause additional engineering costs for operating them (although no fixed additional recurring costs apply).

Feature / Configuration Description

Authentication

Authentication can be configured to use a custom provider in addition to the default VSHN Account. See Supported identity providers for a list of available providers.

Cluster-wide HTTP or HTTPS proxy

Configuring OpenShift to use a cluster-wide HTTP or HTTPS proxy is possible, but incurs additional individual engineering effort. The documentation states: The cluster-wide proxy is only supported if you used a user-provisioned infrastructure installation or provide your own networking, such as a virtual private cloud or virtual network, for a supported provider.

Cluster Admin

For private clusters "Cluster Admin" can be granted. This implies "with great power comes great responsibility". A sign-off is needed.

Disabling of Red Hat remote health monitoring (Telemetry)

OpenShift, by default, continuously sends data to Red Hat, see about remote health monitoring for details. This is enabled by default, but can be disabled on request. The exact metrics sent to Red Hat are documented in data-collection.md. Please note, that we have to ask Red Hat before we are allowed to do so.

Registry configuration

Exposing of the registry via the ingress controller can be configured.

Custom Machine Sets

Custom MachineSet can be defined to customize compute node availability.

OpenShift Pipelines

OpenShift Pipelines are only available on request.

Egress Gateway

The egress IP feature depends on the possibilities of the underlying networking infrastructure and therefore is only supported where the infrastructure allows it.

Audit logging

While audit logging is enabled by default on OpenShift per control plane node (see Viewing node audit logs) they are not forwarded or stored outside the cluster. There is no availability guarantee by default. If there is a need for special treatment of audit logs, it needs to be requested.

OpenShift Service Mesh

The OpenShift Service Mesh gives you more control over the traffic flow between services and gives you telemetry on that traffic. Furthermore, it can increase security by assigning each service a verifiable identity, and allows you to apply policies.

Constraints

These features or configuration adjustments are mandatory. They are required so that we can provide a stable system.

Feature / Configuration Description Reasoning

One node spare capacity

The capacity of one node must always be unused.

In case of a node failure, the system needs free capacity to reschedule the workload that was running on that failed node. This is essential for the self-healing capabilities of Kubernetes. Without that spare capacity, the workload would remain unscheduled and the users of that workload might experience downtime. + The need to reschedule workload is also given during cluster upgrade. While a node is upgraded, its workload is rescheduled to other nodes.

Unsupported

These features or configuration adjustments are not supported by VSHN, but can still be activated or changed, although are neither monitored, backed up nor maintained. No guarantees are given, use them at your own risk.

Feature / Configuration Description Reasoning

Upgrade channels

We only support stable upgrade channels. Changing the channel isn’t supported or encouraged.

The stable upgrade channel offers the most tested upgrades which we see as a cornerstone for a stable service offering. Other channels could be used on non-production clusters. Specifically the fast channel is used for VSHN internal lab clusters for our own update QA.

Network configuration

We support only Cilium as the network plugin.

Networking is a complex component, and therefore we partnered with Isovalent, the maintainers of Cilium. With their support, we offer you a scalable and modern networking stack based on eBPF, that provides all the expected features, and even more. + Migrating a networking plugin is nothing done easily or straight up not possible. Thus, for historical reasons, we still operate some clusters with OVN-Kubernetes and OpenShift SDN.

Jaeger

Support for Jaeger is not available from VSHN (yet).

This is mainly caused due to the lack of experience running Jaeger.

OpenShift Virtualization

No support is available for container-native virtualization.

This is mainly caused due to the lack of experience running container-native virtualization and it is currently in Technology Preview.

OpenShift Serverless

Support for OpenShift Serverless is not available from VSHN (yet).

This is mainly caused due to the lack of experience running OpenShift Serverless and it is currently in Technology Preview.

Operator Lifecycle Manager (OLM)

The Operator Lifecycle Manager is installed and fully functional on the cluster, but we don’t guarantee full functionality of Operators installed via OLM by the end-user.

There are many Operators available via OperatorHub and we are not able to provide support for any of them.

Airgapped (disconnected) environments

Installing and running OpenShift in an airgapped environment, meaning that the cluster has no Internet access, is currently not supported by VSHN.

The cluster needs access to specific endpoints which are documented in the official OpenShift documentation and in the VSHN Knowledgebase. Supporting airgapped setups is on our long-term roadmap.

Bring-Your-Own-Subscription

OpenShift clusters managed by VSHN are bound to VSHNs CCSP subscriptions with Red Hat.

Attaching an OpenShift cluster to another subscription brings in a lot of operational support burden.

Disk Encryption

Encryption of local disks is currently not supported. If encryption at rest is needed it’s up to the storage provider (CSI) to support that.

The needed infrastructure (e.g. Tang server) to provide this feature is not available yet.

Features marked as Technology Preview by Red Hat are unsupported by VSHN as well. A list of Technology Preview features is available in the release notes. For OpenShift 4.12 this list can be found in the OpenShift Container Platform release notes.

Still interested in one (or more) of these unsupported options? Get in contact with sales@vshn.ch and we figure out together what we can offer.

Version and Upgrade Policy

The official Red Hat OpenShift Container Platform Life Cycle Policy applies and has implications on the supported versions.

Supported is only the latest available Red Hat OpenShift 4 release. Installations must be upgraded to the next minor release within three months after a new release is available, or the latest when the next minor release is available.

Errata updates are installed as they are released and include updates to OpenShift itself as well as the Red Hat CoreOS nodes. By default the stable upgrade channel is used.

Support Data Sharing

For getting support from Red Hat we usually have to share status information with Red Hat. This is done using the oc adm must-gather command, which collects support information without sensitive data like secrets. More information about this tool is documented under Gathering data about your cluster.

Cluster Resource Handling and Availability

By the nature of a clustered system like Kubernetes is, some constraints apply to how resources are available to the user of the platform and how to work with them:

  • For having enough room to handle failing nodes and to ease maintenance processes it’s important to adhere to at least n+1 node availability and have at least three worker nodes in the cluster. For example on a three-node cluster it is required to only use the resources of two-thirds of them.

  • Some resources on each node and in the whole cluster are always reserved for system services.

    • Cluster level: there needs to be enough resources available to run the control-plane and other system services like the registry or monitoring component, that’s why there are dedicated nodes in the cluster to run this workload.

    • Node level: there is an amount of resources reserved on each node to allow for operating system services to function properly.

vSphere

Contrary to other platforms that are highly standardized, vSphere platforms tend to differ between installations. This means that the setup process is more complex and involved. VSHN will work with you to assess the situation and check whether we can deploy Red Hat OpenShift on your vSphere infrastructure, and how complex the setup will be.

Since on our internal test clusters we can’t account for your infrastructure, we strongly suggest running an additional test cluster on your infrastructure.

The official documentation from Red Hat applies: hhttps://docs.openshift.com/container-platform/latest/installing/installing_vsphere/ipi/installing-vsphere-installer-provisioned.html[Installing a cluster on vSphere]. Only IPI (installer-provisioned infrastructure) setups are offered by VSHN. This requires the OpenShift installer and the installed cluster to have API access with the required priviledges.

Infrastructure Requirements

In general, all requirements listed in the official documentation (see above) apply. Some highlights:

In addition, we require:

  • A load balancer as described in the Configure an external load balancer section of the installation docs.

  • Internet access to GitHub, Docker Hub as well as VSHN systems from the cluster.

  • Access to the Ingress Router and API (via the aforementioned load balancer) from the internet.

  • Persistent storage via the vSphere CSI Driver

  • S3 compatible Object Storage (used for the OpenShift Container Registry, OpenShift Logging and Cluster Backup)

  • Access to NTP servers for keeping time accurate

Default Configuration / Minimum Requirements

This table shows the default configuration which is applied when nothing else is specified and defines the minimum requirements.

Item Description

Control Plane

3 control plane nodes

  • CPU cores: 4

  • Memory: 16 GB

  • Disk: 100 GB SSD

Control plane node size depends on workload, see our guidelines.

Infrastructure Nodes

4 nodes

  • CPU cores: 4

  • Memory: 16 GB

  • Disk: 100 GB SSD

Compute Nodes

3 nodes

  • CPU cores: 4

  • Memory: 16 GB

  • Disk: 100 GB SSD

Persistent Storage

Storage is provided using the built-in VMware vSphere CSI Driver Operator.

Only RWO volumes are provided by the vSphere CSI driver.

More storage features are available with the Managed Storage Cluster.

Important notes about this configuration

  • Persistent storage is only supported in ReadWriteOnce (RWO) access mode via the built-in "VMware vSphere CSI Driver Operator". Other access modes like ReadOnlyMany (ROX) or ReadWriteMany (RWX) are available on request and implies installing an additional storage service. See access modes for more details.

Limitations

The following limitations are known on this infrastructure:

  • No support by default for LoadBalancer type services. This needs to be engineered case-by-case with the ExternalIP feature of OpenShift.

  • While not explicitely covered by the OpenShift documentation, Red Hat does not support clusters spanning across multiple vCenters.

Resource requirements

To see what the resource requirements look like, this minimal set consists of

  • 36 x vCPU

  • 144 GB RAM

  • 1080 GB SSD storage