Using analog beamforming in mmWave frequency bands we can focus the energy towards a receiver to achieve high throughput. However, this requires the network to quickly find the best downlink beam configuration in the face of non-IID data. We propose a personalized Federated Learning (FL) method to address this challenge, where we learn a mapping between uplink Sub-6GHz channel estimates and the best downlink beam in heterogeneous scenarios with non-IID characteristics. We also devise FedLion, a FL implementation of the Lion optimization algorithm. Our approach reduces the signalling overhead and provides superior performance, up to 33.6 % higher accuracy than a single FL model and 6 % higher than a local model.