The ML Libraries in 2023

Tyler Zhu
4 min readJun 27, 2021

Last update (Dec 11, 2023)

Photo by Gertrūda Valasevičiūtė on Unsplash

When I was in colleague (finishing my applied math degree), all my math professors taught their courses using MATLAB. For the younger version of myself, it was a very heavy piece of software with a dauting interface. I used it for all types of projects, for instance: numerically simulate partial differential equations, implement computational geometry algorithms, build multi-layer neural networks, and etc.

Here is an example of a ReLU-modulated matrix multiplication in MATLAB. A multi-layer “deep” neural network is, roughly speaking, a composition of such operators (Note: the bias term is folded into the input, following the heterogeneous coordinate convention).

That was in 2009, and out of all my MATLAB projects, the one I spent least time on was the one about the multi-layer neural networks. It fascinated me still, back then. I remember tinkering the idea in the elevators and my ways to a few other unrelated classes. Now, I think, that was mostly because of its relatively mysterious Chinese name 神经网络, and also because of its unusually sparse mathematics description in my textbook. Meanwhile, other math courses (differential geometry, and topology) were torturing my brain at least 3+ hours. As a math student, I have to prove each theorem by hand, and the grade is based on how fast one comes up with a correct proof.

During the same period of time, a lot of my computer science friends and classmates used C++. However there was a gap between C++ and the math side of things. The issue with it, obviously, is that setting up development environment takes serious engineering time, especially if one needs to use graphics libraries (e.g., OpenGL) for visualizations. On top of that, my CS friends were extremely protective of their works and didn’t want to share their “secrete sauce” to an elite math folk like me who don’t appreciate their engineering “dirty laundry”. Another major issue is that, I need lots of numeric utilities, and there is no built-in numeric library in C++. The solution, a few years later, became trivial to me, it was to use third-party libraries, like: Eigen or Armadillo, and etc.

Things drastically changed from 2013 to 2023. Interestingly, this also coincided with my personal life changes. I departed from Shanghai and moved to NYC. I met many warmhearted American friends in NYC in 2013. They helped me practice English daily and introduced me to the American culture. I swam roughly 3 times a week at NYU’s Palladium gym, and ate a lot more than I used to. I have access to the full internet and later finished my degree to join Google Inc. Long story short, Python has become a more popular programming language than C++. The “philosophy” of Python is modern: no-complication, immediate code execution via Python virtual machine (if one needs performance, then implemented platform specific Python C++ bindings). As a result, there come many well-packaged highly-optimized numeric libraries: NumPy, SciPy, PyTorch, JAX, TensorFlow (Keras). All of them are open-sourced and support accelerators (GPUs, TPUs). My engineering skills are improved, and setting up the engineering environments is a piece of cake. The following are the same MATLAB neural net but implemented using different libraries. There is no clear winner of which libraries is the best, since, they are quite similar nowadays. As far as I know, the most popular ones in 2023 are: PyTorch, JAX, and TensorFlow.

Here is an example of a ReLU-modulated matrix multiplication in NumPy. The Autograd library natively support NumPy differentiates computing, but the main developers behind Autograd are now working on JAX.
Here is an example of a ReLU-modulated matrix multiplication in JAX. It provides a convenient module jax.grad to compute gradients of any JAX functions.
Here is an example of a ReLU-modulated matrix multiplication in FLAX. It provides a convenient module neural networks module and can be used with jax.grad to compute gradients of FLAX module.
Here is an example of a ReLU-modulated matrix multiplication in PyTorch. The way PyTorch computes differentials is more dedicated to backpropagation-styled usecases. Note: PyTorch’s nn.Module imposes a predefined Class structure that one needs to follow, and it provides a few useful methods to manage states.
PyTorch can also use the available GPU devices for compute. This is done using CUDA or cuDNN. Here is the above example of a ReLU-modulated matrix multiplication in PyTorch, but it’s configurated to run on a GPU.

Another library is TensorFlow, which went under noticeable iterations. The latest major version 2.0 mixes Keras and a new eager execution engine. The old session-based execution engine is being deprecated, thus we use TensorFlow to refer to its version 2.0.

Here is an example of a ReLU-modulated matrix multiplication in TensorFlow. The way TensorFlow computes differentials requires explicitly marking with the GradientTape context and computes only for Variables. Note: Keras subclass API imposes a predefined Class structure that one needs to follow, and it provides a few useful methods to manage states.

Apache MXNet is another frame work. But not many people outside Microsoft use it, therefore I won’t spend time to talk about it in more details. Windows is an excellent platform for engineering, many great toolings and languages (e.g., VS Code, TypeScript). I am sure that similarly MXNet will excel by itself.

--

--

Tyler Zhu

Joined Google in 2015, working in the AI, ML space. Current: ShoppingX on Search. Past: Google Research. Co-author of posenet.js and bodypix.js