Getting real IPs from behind ingress
A whirlwind dive into kube LoadBalancer services, and a lesson in how things are not always what they seem. I have Authentik running in my Kubernetes cluster. I wanted to be able to see clients’ real IPs in the application. Instead, all client IPs were showing as I’ll be honest here – this problem had plagued me for months. I’d worked on it and rage-quit a few times. So here’s a few things I tried that didn’t work: I’m ashamed to admit how long it took me to figure out what was really causing this behavior. I discovered a link deep inside one of many forum posts I read. The problem was in Kubernetes. Well, sort of. To back up a little bit, ingress to my k8s cluster is implemented via a …at least at a high level.1 The problem is, that diagram is missing a few pieces of the request flow (which I didn’t realize). You see, there’s a very important field on the The default value for this field (and what I was using) is Because This should be an easy fix, right? Set So, any guesses what happened when I made that change, not realizing the caveat? Yeah, no, ingress traffic started timing out left and right. Boy, was I mystified. Time for some implementation details (of my setup)! kube-vip is running as a Traefik, the ingress controller, is running as a multi-replica A detail I’d noticed quite some time ago and dismissed as benign but bemusing was that kube-vip had allocated the same IP for all of the services it was in charge of. Since they happened to all be TCP-only, listen on different ports, and were using However, with this new approach, that couldn’t work. We’d need to ensure that ingress was on a unique IP, so that kube-vip could announce that only from nodes running ingress pods. It turns out, functionality to handle just that was semi-recently added to kube-vip. So I set out to enable it. And…it seemed to work. Except it was still announcing the new ingress service IP from all nodes. Not only that, but I ran into this bug which caused the control-plane IP to now get announced from worker nodes. This meant that external control-plane access went down. MetalLB is a project that seeks to solve many of the same problems that kube-vip solves. Seeing that MetalLB fully supports So, reading the docs, one slight problem arose. MetalLB does not function as a control-plane VIP the same way kube-vip does. Now what? As it turns out, you can run both of them side-by-side without issue provided they’re both not managing the same set of resources. So I adjusted kube-vip to ignore services and only scheduled it on control-plane nodes. Relegated to that role, it’s working fine. From there, I was able to configure MetalLB to pick up the services slack pretty easily. I love that it uses CRDs for its config now – when last I used it a few years back, it was still based on Now that everything else was happily working, I had one remaining issue. Remember how I mentioned running Gitea? Yeah that has a load balancer service for ssh repo access. Having the ssh clone address match the https clone address and web interface is a huge boon for convenience. I thought it would be as straightforward as enabling IP Address Sharing for the two services. But those with reading comprehension better than mine probably noticed this little tidbit (emphasis mine): Services can share an IP address under the following conditions: They both have the same sharing key. They request the use of different ports (e.g. tcp/80 for one and tcp/443 for the other). They both use the Cluster external traffic policy, or they both point to the exact same set of pods (i.e. the pod selectors are identical). You see, dear reader, that was not to be. So now what? Well, it turns out that the cause of, and solution to, all of life’s problems is indeed ingress. Traefik is perfectly happy to do straight TCP routing if you tell it to do so. Since I knew I only needed one thing to listen on port 22, I didn’t have to deal with TCP termination, it could just straight route it back to Gitea. That took the following config (on the Traefik helm chart) to achieve: And then creating an et voila, ssh worked on the same IP/host as ingress! We can take a few things away from this. Firstly, I’m bad at reading docs. Secondly, life is often a string of Hal lightbulb moments. And finally, the stack always goes higher and deeper than you think, and that’s where the problem is bound to be. The Problem
10.42.0.0
in the app. Not the Cause
The Cause
Service
of type LoadBalancer
. Since I’m on bare metal, I use kube-vip to implement LoadBalancer
s. So the full traffic flow for an ingress request is:(eyeball) -> (loadbalancer service) -> (ingress pods) --> (application)
Service
spec I was overlooking: externalTrafficPolicy
. And there was my dagger!Cluster
. Cluster
will accept a request for the Service
on any node in the cluster and route the traffic to any ready endpoint in the cluster – even if that’s on another node. Local
, on the other hand assumes that you’re handling your own load balancing and will only route the traffic to ready endpoints on the node the traffic was received.externalTrafficPolicy: Cluster
is potentially proxying traffic to another node, it by design “obscures the client source IP and may cause a second hop to another node”2. Mic. Drop. The Fix?
externalTrafficPolicy: Local
on the ingress service and be done with it. But no, not quite. Because if kube-proxy
receives traffic for a service with externalTrafficPolicy: Local
on a node that isn’t home to a ready endpoint for that service, it will silently drop that traffic. Deep Dive
DaemonSet
across all nodes in my cluster. In addition to handling LoadBalancer
services, it also provides a VIP for the control plane. It’s running in BGP mode, announcing routes to my router. At this time, all IPs under its control were announced from all nodes.Deployment
. I firmly believe ingress should not be a DaemonSet
, as it shouldn’t need to run on every cluster node (this is actually important at scale). It would stand to reason that making it a DaemonSet
would solve this problem, but 👎.externalTrafficPolicy: Cluster
, this was fine. kube-vip fights back
Enter: MetalLB
externalTrafficPolicy: Local
, I decided to give it a try. I figured I’d already mucked up my cluster enough, why not?ConfigMap
s and environment variables. This was super slick to set up and its docs are pretty thorough and well-organized. Cleaning up
ports:
ssh:
port: 22
expose: true
IngressRouteTCP
was easy:apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
metadata:
labels:
app.kubernetes.io/instance: gitea
name: gitea-ssh
namespace: gitea
spec:
entryPoints:
- ssh
routes:
- match: HostSNI(`*`)
services:
- name: gitea-ssh
port: 22
Conclusion