I’m impressed with what Apple has come up with their most advanced on-device model. From the Machine Learning blog:
Built on cutting-edge Apple research, this 20-billion-parameter model uses a sparse architecture, activating just 1 to 4 billion parameters at a time depending on the request.