Introduction

Welcome to the MeshLab repository! In this lab, you will find a setup to validate Istio configurations in a cell-based architecture. Each cell is an architecture block representing a unit of isolation and scalability. The lab defines two cells, named pasta and pizza, each composed of two clusters. Each cluster is configured with a multi-primary Istio control-plane for high availability and resilience.

Although the cells share the same root CA for their cryptographic material, each one uses a different SPIFFE trustDomain and each cluster within a cell has its own intermediate CA. Locality failover is possible within the clusters of a cell, and all mTLS cross-cluster traffic flows through east-west Istio gateways because pod networks have non-routable CIDRs.

The purpose of this lab is to test and validate different Istio configurations in a realistic environment.

Helm is used to deploy:

Argo Workflows and ArgoCD are used to deploy:

Quick Start

To quickly get started with the MeshLab repository, follow these simple steps:

./bin/meshlab-multipass create
./bin/meshlab-multipass suspend
./bin/meshlab-multipass delete

Components

Pull-through registries

A pull-through registry is a proxy that sits between your local Docker installation and a remote Docker registry. It caches the images you pull from the remote registry, and if another user on the same network tries to pull the same image, the pull-through registry will serve it to them directly, rather than pulling it again from the remote registry. The Container Runtime Interface (CRI) in this lab is set up to use local pull-through registries for the remote registries docker.io, quay.io and ghcr.io on each cluster.

List all images in a registry:

curl -s 127.0.0.1:5011/v2/_catalog | jq # docker.io
curl -s 127.0.0.1:5012/v2/_catalog | jq # quay.io
curl -s 127.0.0.1:5013/v2/_catalog | jq # ghcr.io

List tags for a given image:

curl -s 127.0.0.1:5012/v2/argoproj/argocd/tags/list | jq

Get the manifest for a given image and tag:

curl -s http://127.0.0.1:5012/v2/argoproj/argocd/manifests/v2.4.7 | jq

Multipass

Multipass from Canonical is a tool for launching, managing, and orchestrating Linux virtual machines on local computers, simplifying the process for development, testing, and other purposes. It provides a user-friendly command-line interface and integrates with other tools for automation and customization.

Stop/start multipassd:

sudo launchctl unload /Library/LaunchDaemons/com.canonical.multipassd.plist
sudo launchctl load -w /Library/LaunchDaemons/com.canonical.multipassd.plist

Restart multipassd:

sudo launchctl kickstart -k system/com.canonical.multipassd

Directories of interest:

sudo tree /var/root/Library/Caches/multipassd
sudo tree /var/root/Library/Application\ Support/multipassd
sudo tree /Library/Application\ Support/com.canonical.multipass

List all available instances:

multipass list

Display information about all instances:

multipass info

Open a shell on a running instance:

multipass shell pasta-1

Tail the logs:

sudo tail -f /Library/Logs/Multipass/multipassd.log

Hypervisor.framework

The drivers utilized on MacOS, specifically HyperKit and QEMU, rely on MacOS' Hypervisor.framework to manage the networking stack for the instances. When an instance is created, the Hypervisor.framework on the host employs MacOS' 'Internet Sharing' mechanism to establish a virtual switch. Each instance is then connected to this switch with a subnet address from:

$ sudo cat /Library/preferences/SystemConfiguration/com.apple.vmnet.plist | grep -A1 Shared_Net_Address
Password:
	<key>Shared_Net_Address</key>
	<string>192.168.65.1</string>

Furthermore, the host provides DHCP and DNS resolution services on this switch through the IP address 192.168.65.1, facilitated by the bootpd and mDNSResponder services running on the host machine. It is worth noting that attempting to manually edit the configuration file /etc/bootpd.plist is futile, as MacOS will regenerate it according to its own preferences.

Is the bootpd DHCP server alive?

sudo lsof -iUDP:67 -n -P

Start it:

sudo launchctl load -w /System/Library/LaunchDaemons/bootps.plist

Flush all DHCP leases:

sudo launchctl stop com.apple.bootpd
sudo rm -f /var/db/dhcpd_leases
sudo launchctl start com.apple.bootpd

It appears that at a certain juncture, docker and multipass ceased to share the same network bridge. Whichever starts first will occupy bridge100 with the IP address 192.168.64.1, while the subsequent one will take bridge101 with the IP address 192.168.65.1. Upon repeatedly stopping and starting these services, you will notice a sequential increment in the third octet of the Shared_Net_Address.

Cloud-init

cloud-init is a tool used to configure virtual machine instances in the cloud during their first boot. It simplifies the provisioning process, enabling quick setup of new environments with desired configurations. The following commands provide examples for monitoring and inspecting the cloud-init process on various nodes in the system, including logs and scripts run during the instance's first boot.

Tail the cloud-init logs:

multipass exec mnger-1 -- tail -f /var/log/cloud-init-output.log
multipass exec pasta-1 -- tail -f /var/log/cloud-init-output.log
multipass exec pasta-2 -- tail -f /var/log/cloud-init-output.log

Inspect the rendered runcmd:

multipass exec mnger-1 -- sudo cat /var/lib/cloud/instance/scripts/runcmd
multipass exec pasta-1 -- sudo cat /var/lib/cloud/instance/scripts/runcmd
multipass exec pasta-2 -- sudo cat /var/lib/cloud/instance/scripts/runcmd
multipass exec virt-01 -- sudo cat /var/lib/cloud/instance/scripts/runcmd

k3s

k3s is a lightweight version of Kubernetes designed for resource-constrained environments like IoT devices and edge computing. It requires fewer resources and has additional features such as simplified installation and compatibility with ARM architectures.

Run config check:

multipass exec pasta-1 -- bash -c "sudo k3s check-config"
multipass exec pasta-2 -- bash -c "sudo k3s check-config"

Cilium

Cilium is an open source, cloud native solution for providing, securing, and observing network connectivity between workloads, fueled by the revolutionary Kernel technology eBPF.

Display status:

cilium --context pasta-1 status

Show status of ClusterMesh:

cilium --context pasta-1 clustermesh status

Display status of daemon:

k --context pasta-1 -n kube-system exec ds/cilium -c cilium-agent -- cilium-dbg status

Display full details:

k --context pasta-1 -n kube-system exec ds/cilium -c cilium-agent -- cilium-dbg status --verbose

List services:

k --context pasta-1 -n kube-system exec ds/cilium -c cilium-agent -- cilium-dbg service list

Troubleshoot connectivity towards remote clusters:

k --context pasta-1 -n kube-system exec ds/cilium -c cilium-agent -- cilium-dbg troubleshoot clustermesh

ArgoCD

ArgoCD is a GitOps platform for Kubernetes applications that enables continuous delivery with declarative management and automation of deployments from Git repositories to multiple clusters. With its user-friendly interface, robust features, and deep Kubernetes integration, ArgoCD is a popular choice for automating application delivery.

List all the applications:

argocd app list

Manually sync applications:

argocd app sync -l name=istio-issuers --async
argocd app sync -l name=istio-base --async
argocd app sync -l name=istio-cni --async
argocd app sync -l name=istio-istiod --async
argocd app sync -l name=istio-nsgw --async
argocd app sync -l name=istio-ewgw --async

CoreDNS

CoreDNS is a flexible, extensible DNS server that can be easily configured to provide custom DNS resolutions in Kubernetes clusters. It allows for dynamic updates, service discovery, and integration with external data sources, making it a popular choice for service discovery and network management in cloud-native environments.

Create DNS records for demo.lab:

k --context pasta-1 -n kube-system create configmap coredns-custom --from-literal=demo.server='demo.lab {
  hosts {
    ttl 60
    192.168.65.3 worker.service-1.demo.lab
    192.168.65.3 worker.service-2.demo.lab
    fallthrough
  }
}'

Vault

Blah, blah, blah...

cert-manager

Cert-manager is an open-source software that helps automate the management and issuance of TLS/SSL certificates in Kubernetes clusters. It integrates with various certificate authorities (CAs) and can automatically renew certificates before they expire, ensuring secure communication between services running in the cluster.

Print the cert-manager CLI version and the deployed cert-manager version:

cmctl --context pasta-1 version

This check attempts to perform a dry-run create of a cert-manager v1alpha2 Certificate resource in order to verify that CRDs are installed and all the required webhooks are reachable by the K8S API server. We use v1alpha2 API to ensure that the API server has also connected to the cert-manager conversion webhook:

cmctl check api --context pasta-1

Get details about the current status of a cert-manager Certificate resource, including information on related resources like CertificateRequest or Order:

cmctl --context pasta-1 --namespace istio-system status certificate istio-cluster-ica

Mark cert-manager Certificate resources for manual renewal:

cmctl renew --context pasta-1 --namespace istio-system istio-cluster-ica

Istio

Istio is an open-source service mesh platform that provides traffic management, policy enforcement, and telemetry collection for microservices applications. It helps in improving the reliability, security, and observability of service-to-service communication in a cloud-native environment. By integrating with popular platforms such as Kubernetes, Istio makes it easier to manage the complexities of microservices architecture.

Lists the remote clusters each istiod instance is connected to:

istioctl --context pasta-1 remote-clusters

Access the istiod WebUI:

istioctl --context pasta-1 dashboard controlz deployment/istiod-1-22-2.istio-system

klipper-lb

klipper-lb uses a host port for each Service of type LoadBalancer and sets up iptables to forward the request to the cluster IP. The regular k8s scheduler will find a free host port. If there are no free host ports, the Service will stay in pending. There is one DaemonSet per Service of type LoadBalancer and each Pod has one container per exposed Service port.

List the containers fronting the exposed argocd-server ports:

k --context mnger-1 -n kube-system get ds -l svccontroller.k3s.cattle.io/svcname=argocd-server -o yaml | yq '.items[].spec.template.spec.containers[].name'

List the containers fronting the exposed istio-eastwestgateway ports:

k --context pasta-1 -n kube-system get ds -l svccontroller.k3s.cattle.io/svcname=istio-eastwestgateway -o yaml | yq '.items[].spec.template.spec.containers[].name'

List the containers fronting the exposed istio-ingressgateway ports:

k --context pasta-1 -n kube-system get ds -l svccontroller.k3s.cattle.io/svcname=istio-ingressgateway -o yaml | yq '.items[].spec.template.spec.containers[].name'

Envoy

Envoy is an open-source proxy server designed for modern microservices architectures, providing features such as load balancing, traffic management, and service discovery. It runs standalone or integrated with a service mesh, making it a powerful tool for microservices communication.

Inspect the config_dump of a VM:

multipass exec virt-01 -- curl -s localhost:15000/config_dump | istioctl pc listeners --file -
multipass exec virt-01 -- curl -s localhost:15000/config_dump | istioctl pc routes --file -
multipass exec virt-01 -- curl -s localhost:15000/config_dump | istioctl pc clusters --file -
multipass exec virt-01 -- curl -s localhost:15000/config_dump | istioctl pc secret --file -

Set debug log level on a given proxy:

istioctl pc log sleep-xxx.httpbin --level debug
k --context pasta-1 -n httpbin logs -f sleep-xxx -c istio-proxy

Access the WebUI of a given envoy proxy:

istioctl dashboard envoy sleep-xxx.httpbin

Dump the envoy config of an eastweast gateway:

k --context pasta-1 -n istio-system exec -it deployment/istio-eastwestgateway -- curl -s localhost:15000/config_dump

Dump the common_tls_context for a given envoy cluster:

k --context pasta-1 -n httpbin exec -i sleep-xxx -- \
curl -s localhost:15000/config_dump | jq '
  .configs[] |
  select(."@type"=="type.googleapis.com/envoy.admin.v3.ClustersConfigDump") |
  .dynamic_active_clusters[] |
  select(.cluster.name=="outbound|80||httpbin.httpbin.svc.cluster.local") |
  .cluster.transport_socket_matches[] |
  select(.name=="tlsMode-istio") |
  .transport_socket.typed_config.common_tls_context
'

List LISTEN ports:

k --context pasta-1 -n istio-system exec istio-eastwestgateway-xxx -- netstat -tuanp | grep LISTEN | sort -u

Check the status-port:

curl -o /dev/null -Isw "%{http_code}" http://10.0.16.124:31123/healthz/ready

Testing

Send requests to service-1 from an unauthenticated out-of-cluster workstation via the north-south Istio ingress gateway:

IP=$(multipass list | awk '/pasta-1/ {print $3}')
curl -sk --resolve service-1.demo.lab:443:${IP} https://service-1.demo.lab/data | jq -r '.podName'

Same as above but with certificate validation:

IP=$(multipass list | awk '/pasta-1/ {print $3}')
k --context pasta-1 -n istio-system get secret cacerts -o json | jq -r '.data."ca.crt"' | base64 -d > /tmp/ca.crt
curl -s --cacert /tmp/ca.crt --resolve service-1.demo.lab:443:${IP} https://service-1.demo.lab/data | jq -r '.podName'

Locality load balancing

Istio's Locality Load Balancing (LLB) is a feature that helps distribute traffic across different geographic locations in a way that minimizes latency and maximizes availability. It routes traffic to the closest available instance of the service, reducing network hops and improving performance, while also providing fault tolerance and resilience. LLB is important for managing microservices architectures.

From the perspective of istio-nsgw: get the endpoints, priority, and weight of service-1:

# Get a running pod name
POD=$(k --context pasta-1 -n istio-system get po -l istio=nsgw --no-headers | awk 'NR==1{print $1}')

# Add an ephemeral container to the running pod
k --context pasta-1 -n istio-system debug -it \
--attach=false --image=istio/base --target=istio-proxy --container=debugger \
${POD} -- bash

# Watch for the endpoints
watch "istioctl --context pasta-1 -n istio-system pc endpoint deploy/istio-nsgw | grep -E '^END|service-1'; echo; k --context pasta-1 -n istio-system exec -it ${POD} -c debugger -- curl -X POST localhost:15000/clusters | grep '^outbound.*service-1' | grep -E 'zone|region|::priority|::weight' | sort | sed -e '/:zone:/s/$/\n/'"

TLS

TLS 1.3 is the latest version of the TLS protocol. TLS, which is used by HTTPS and other network protocols for encryption, is the modern version of SSL. TLS 1.3 dropped support for older, less secure cryptographic features, and it speeds up TLS handshakes, among other improvements.

Setup a place to dump the crypto material:

k --context pasta-1 -n httpbin patch deployment sleep --type merge -p '
spec:
  template:
    metadata:
      annotations:
        sidecar.istio.io/userVolume: "[{\"name\":\"sniff\", \"emptyDir\":{\"medium\":\"Memory\"}}]"
        sidecar.istio.io/userVolumeMount: "[{\"name\":\"sniff\", \"mountPath\":\"/sniff\"}]"
        proxy.istio.io/config: |
          proxyMetadata:
            OUTPUT_CERTS: /sniff
'

Write the required per-session TLS secrets to a file (source):

k --context pasta-1 apply -f - << EOF
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: httpbin
  namespace: httpbin
spec:
  workloadSelector:
    labels:
      app: sleep
  configPatches:
  - applyTo: CLUSTER
    match:
      context: SIDECAR_OUTBOUND
      cluster:
        service: "httpbin.httpbin.svc.cluster.local"
        portNumber: 80
    patch:
      operation: MERGE
      value:
        transport_socket:
          name: "envoy.transport_sockets.tls"
          typed_config:
            "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext"
            common_tls_context:
              key_log:
                path: /sniff/keylog
EOF

Restart envoy to kill all TCP connections and force new TLS handshakes:

k --context pasta-1 -n httpbin exec -it deployment/sleep -c istio-proxy -- curl -X POST localhost:15000/quitquitquit

Optionally, use this command to list all available endpoints:

istioctl --context pasta-1 pc endpoint deploy/httpbin.httpbin | egrep '^END|httpbin'

Start tcpdump:

k --context pasta-1 -n httpbin exec -it deployment/sleep -c istio-proxy -- sudo tcpdump -s0 -w /sniff/dump.pcap

Send a few requests to the endpoints listed above:

k --context pasta-1 -n httpbin exec -i deployment/sleep -- curl -s httpbin/hostname | jq -r 'hostname'

Stop tcpdump and download everything:

k --context pasta-1 -n httpbin cp -c istio-proxy sleep-xxx:sniff ~/sniff

Open it with Wireshark:

open ~/sniff/dump.pcap

Filter by tls.handshake.type == 1 and follow the TLS stream of a Client Hello packet. Right click a TLSv1.3 packet then Protocol Preferences --> Transport Layer Security --> (Pre)-Master-Secret log filename and provide the path to the keylog file.

Certificates

Find below a collection of commands to troubleshoot certificate issues.

Connect to the externally exposed istiod service and inspect the certificate bundle it presents:

step certificate inspect --bundle --servername istiod-1-19-6.istio-system.svc https://192.168.65.3:15012 --roots /path/to/root-ca.pem
step certificate inspect --bundle --servername istiod-1-19-6.istio-system.svc https://192.168.65.3:15012 --insecure

Inspect the certificate chain provided by a given workload:

istioctl --context pasta-1 pc secret httpbin-xxxxxxxxxx-yyyyy.httpbin -o json | jq -r '.dynamicActiveSecrets[] | select(.name=="default") | .secret.tlsCertificate.certificateChain.inlineBytes' | base64 -d | step certificate inspect --bundle

Inspect the certificate root CA present in a given workload:

istioctl --context pasta-1 pc secret sleep-xxxxxxxxxx-yyyyy.httpbin -o json | jq -r '.dynamicActiveSecrets[] | select(.name=="ROOTCA") | .secret.validationContext.trustedCa.inlineBytes' | base64 -d | step certificate inspect --bundle

Similar as above but this time as a client:

k --context pasta-1 -n httpbin exec -it deployment/sleep -c istio-proxy -- openssl s_client -showcerts httpbin:80

Get details about the status of a cert-manager managed certificate:

cmctl --context pasta-1 --namespace applab-blau status certificate blau

Development

Provision only one VM:

source ./lib/misc.sh && launch_k8s mnger-1
source ./lib/misc.sh && launch_vms virt-01

Debug

Add locality info:

k --context pasta-1 -n httpbin patch workloadentries httpbin-192.168.65.5-vm-network --type merge -p '{"spec":{"locality":"milky-way/solar-system/virt-01"}}'
k --context pasta-1 -n httpbin patch deployment sleep --type merge -p '{"spec":{"template":{"metadata":{"labels":{"istio-locality":"milky-way.solar-system.pasta-1"}}}}}'
k --context pasta-1 -n httpbin label pod sleep-xxxx topology.istio.io/subzone=pasta-1 topology.kubernetes.io/region=milky-way topology.kubernetes.io/zone=solar-system
k --context pasta-1 -n httpbin patch deployment sleep --type merge -p '{"spec":{"template":{"metadata":{"labels":{
  "topology.kubernetes.io/region":"milky-way",
  "topology.kubernetes.io/zone":"solar-system",
  "topology.istio.io/subzone":"pasta-1"
}}}}}'

Delete locality info:

k --context pasta-1 -n httpbin patch workloadentries httpbin-192.168.65.5-vm-network --type json -p '[{"op": "remove", "path": "/spec/locality"}]'
k --context pasta-1 -n httpbin patch deployment sleep --type json -p '[{"op": "remove", "path": "/spec/template/metadata/labels/istio-locality"}]'
k --context pasta-1 -n httpbin label pod sleep-xxxx topology.istio.io/subzone- topology.kubernetes.io/region- topology.kubernetes.io/zone-

Set debug images:

k --context pasta-1 -n istio-system set image deployment/istiod-1-19-6 discovery=docker.io/h0tbird/pilot:1.19.6
k --context pasta-1 -n httpbin patch deployment sleep --type merge -p '{"spec":{"template":{"metadata":{"annotations":{"sidecar.istio.io/proxyImage":"docker.io/h0tbird/proxyv2:1.19.6"}}}}}'

Unset debug images:

k --context pasta-1 -n istio-system set image deployment/istiod-1-19-6 discovery=docker.io/istio/pilot:1.19.6
k --context pasta-1 -n httpbin patch deployment sleep --type merge -p '{"spec":{"template":{"metadata":{"annotations":{"sidecar.istio.io/proxyImage":"docker.io/istio/proxyv2:1.19.6"}}}}}'

Debug:

k --context pasta-1 -n httpbin exec -it deployments/sleep -c istio-proxy -- sudo bash -c 'echo 0 > /proc/sys/kernel/yama/ptrace_scope'
k --context pasta-1 -n istio-system exec -it deployments/istiod-1-19-6 -- dlv dap --listen=:40000 --log=true
k --context pasta-1 -n istio-system port-forward deployments/istiod-1-19-6 40000:40000