Comparison of Deep Learning Libraries After Years of Use

As we witness the golden age of AI underpinned by deep learning, there are many different tools and frameworks continuously proposed. Sometimes it is even hard to catch up what is going on. You choose one over another then you see a new library and you go for it. However, it seems the exact choice is oblivious to anyone.

According to me, libraries are measured by flexibility and run-time trade-off. If you go with a library which is really easy to use, it is slow as much as that. If the library is fast, then it does not serve that much flexibility or it is so specialized to a particular type of models like Convolutional NNs.

After all the tears and blood dropped through years of experience in deep learning, I decide to share my own intuition and opinion about the common deep learning libraries so that these might help you to choose the right one for your own sake .

Let’s start by defining some evaluations metrics for comparision. These are the pinpoints that I consider;

  1. Community support :  It is really important, especially for a beginner to ask questions and gather answers to learn the library. This is basically related to success and visibility of the community.
  2. Documentation:  Even you are familiar to a library, due to their extensive and evolving nature , updated documentation is really vital part of the library.  A library might be the best but, it also needs decent documentation to prove it to users.
  3. Stability:  Many of the libraries are open-source. It is of course good to have all, but open-source means more fragile and buggy implementations. It is also really hard to understand in advance that a library is stable enough to use for your code. It needs time to investigate. The worse part of this is to see these peculiar problems in the end of your project. It is really disruptive, I experienced once and never again 🙂
  4. Run-time performance:  It includes; GPU, CPU run-times and use of the hardware capabilities, distributed training with multiple-GPUs on single machine and multiple machines and memory use which limits the model you train.
  5. Flexibility: Experimenting new things and development of new custom tools and layers are also crucial part of the game. If you are a researcher it is maybe the foremost metric to consider.
  6. Development: Some libraries are being developed with a great pace and therefore it is always easy to find new functionalities and state-of-art layers and functions. It is good from that point but sometimes it makes the library hard to consolidate, if especially it has deficient documentation.
  7. Pre-trained models and examples: For a hacker who aims to use deep learning in a particular project, this is the most important metric. Many of the successful deep learning models are trained by using big computer clusters with very extensive setups. Not everyone is able to budget up for such computation power. Therefore, it is important to have pre-trained models to step into.

Below under each library, I discuss these metrics.

Note: I don not include Tensorflow since I don’t have enough experience to comment on.


Torch is the one which I use as the latest (Edit: It is now my favorite). As most of you know,  Torch is a Lua based library and used extensively by Facebook and Twitter Research teams for deep learning products and research.

  1. Community Support: It has a relatively small community compared to other libraries but still it is very responsive to any problem and question that you encounter.
  2. Documentation: Good documentation is waiting for you. But still if you are new to Lua too, sometimes it is not enough and it leaves you to google more. There is also Gitter and Google groups you can visit to learn more, ask questions and solve problems.
  3. Stability:  I’ve not seen any problem since the beginning. Sometimes updating modules are burdensome since each of these are nested to each other so tightly. Any minor change on some module expects you to update others.
  4. Run-time performance: It is the most powerful metric of Torch. It uses all the capability of any hardware you use. You can switch between GPU and CPU by simple function calls.  It is also very easy to use multiple-GPUs in data-parallel fashion in a single machine but yet I do not see any support for distributed training.
  5. Flexibility:  Torch is a really flexible framework which allows you to develop any kind of Deep Learning architecture with ease. Even very intricate CNN networks or tangled NLP architectures are easily managed. It modular architecture makes everything smooth. It is really easy to code custom layers or make changes. You are able to integrate C++  with Torch when you need it. It does not support auto-differentiation by heart but Twitter’s module auto-grad provides it .  All these being said, it is the most accessible library for research and prototyping.
  6. Development: It is maybe the most successful framework which traces down what is new in the deep learning literature.  New layers and functions recently proposed are always in the scope of Torch or at least any third part module.
  7. Pre-trained models and examples: It has a pretty good collection of pre-trained models. You can also convert Caffe models to Torch with the aid of third party modules. There are many examples code and community projects on Github in diferent domains. You pull and start to play in your own project.  I really suggest you to check Community Wiki to see such codes.


I am fond of using Torch for my Deep Learning problems due to its flexibility, well-rounded community projects and low memory use with the help of OptNet library. However, I really wish I could have it in Python instead of Lua.  Lua is not that hard but in relation to Python’s vast eco-system, Lua is still pre-mature language. Yet, there is a great library Lutorpy which makes Torch calls plausible in Python.


MxNet is a library backed by Distribute Machine Learning Community that already conducted many great project such as the dream tool xgboost of many Kagglers . Albeit it is not highlighted on web or deep learning communities,  it is really powerful library interfacing many different languages; Python, Scala, R. Lately Amazon announced its support and use of MxNet  as its main Deep Learning library.

  1. Community Support: Every conversation and question is going on through github issues and most of the problems are answered directly by the core developers which is great. Beside, you need patience to get your answer.
  2. Documentation: Compared to its really fast development progress, its documentation falls slightly behind. It has decent  documentation covering fundamentals and basics with example codes and intuitive explanations. However, due to its pace, It is better to follow repo changes and read raw codes to see what is latest.
  3. Stability:  Dense development effort causes some instability issues. I experienced couple of those .  For instance, a trained model gives different outputs with different back-end architectures. I guess they solved it to some extend but still I see it with some of my models. One another problem with the latest updates you are not able to use most of the pre-trained models. You need to make some changes to make them compatible again. It is discouraging for especially a beginner.
  4. Run-time performance: I is a fast library in both GPU and CPU. One caveat is to set all the options well for your machine configuration. It has really efficient memory footprint compared to other libraries by its optimized computational graph paradigm. I am able to train many large models that are not allowed by the other libraries. It is very easy to use multiple GPUs and it supports distributed training as well by distributed SGD algorithm.
  5. Flexibility: It is based on a third party Tensor computation library called MShadow. Therefore, you need to learn that first to develop custom things utilizing full potential of the library. You are also welcome to code with interfaced languages like Python. It is also possible to use implemented blocks to create some custom functionalities as well. To be honest, It is not easy to do custom things in efficient ways. I also did not see so much researcher using MxNet.
  6. Development: It inhibits really good development effort, mainly regulated by the core developers of the team. Still they’re open to pull requests and discuss something new.
  7. Pre-trained models and examples: You can convert some set of pre-trained Caffe models like VGG by the provided script. They also released InveptionV3 type of ImageNet networks and InceptionV2 type model trained on 21K ImageNet collection which is really great for fine-tuning. I also wait for ResNet but still none. They released variety of ResNet models for 1K and 11K ImageNet.


This is the library of my choice for many of my old projects (before torch), mostly due to run-time efficiency, really solid Python support and efficient GPU memory use.

Some critics, MxNet mostly support Vision problems and they partially start to work on NLP architectures. You need to convert all data to their data format for the best efficiency, it slows the implementation time but makes things more efficient in terms of memory and hard-drive use. Still for small projects, it is a pain. You can convert your data to numpy array and use it but then you are not able to use extensive set of data augmentation techniques provided by the library.

MxNet also supports mobile platforms with its densely packaged version. I experimented couple of times on Android. It gives acceptable run-times unless you keep the model size small.  (Torch has similar support but it is not actively maintained).

I decided to leave it due to three main reasons. It gives (by that time) insufficient support to RNN models which is important for NLP.  It is a very flexible library but sometimes this flexibility makes simple things hard. Its computation is not consistent in different platforms and hardwares. The difference is subtle but such as in case of feature extraction in creates difference and degrades performance.


Theano is maintained by Montreal group. It is the first of its kind as far as I know. It is a Python library which takes your written code and compiles it to C++ and CUDA. Hence, it targets machine learning applications, not just deep learning. It also converts the code to computational graph like MxNet then optimizes memory and execution. However, all these optimizations take time which is the real problem of the library. Since Theano is a general use machine learning library, following facts are based on higher level deep learning  libraries Lasagne and Keras.

  1. Community Support: They have both big communities supporting google user groups and github issue pages. I’d say Keras has more support then Lasagne. You can get any question answered quickly.
  2. Documentation: Simple but powerful documentation for both. Once you got the logic behind these libraries, it is so fluid to develop your own models and applications. Each important subject is explained by an example which I really like to see from scikit-learn as well.
  3. Stability: They are really high paced libraries. Due to Theano’s simplicity to develop new things, they follow what is new easily but it is also dangerous in terms of stability. As far as you do not rely on these latest features, they are stable.
  4. Run-time performance: They are bounded by the abilities of Theano and beside this any Theano based library just diverges by the programming techniques and use of Theano backend.  The real problem for these libraries is the compile time in which you wait before model execution.  It is sometimes too much to bare, specially for large models. If you compile successfully, it is really fast for training in GPU. I’ve not experienced CPU execution too much. Memory use is not that efficient compared to MxNet but still comparable with Torch. AFAIK, they started to support multi GPU execution after the last version of Theano but distributed training is still out of the scope. (Keras stated to support Tensorflow beckend. Therefore, you might like to evade compile time by using that)
  5. Flexibility: Due to auto-differentiation of Theano and the syntactic goods of Python, it is really easy to develop something new. You only need to take a already implemented layer or a function then modify it to your custom idea. Keras in particular has very modular structure that shortens your development time. Lasagne provides you building blocks to come up something on your own. You can easily design you model but you need to write your own training code.
  6. Development: These libraries are really community driven open-source counterparts. They are so fast to capture what is new . Due to the easiness of development, sometimes one thing might have lots of alternative implementations. It sometimes takes time to find the best among many other alternatives. That makes these a bit unstable for early stage ideas.
  7. Pre-trained models and examples: They provide VGG networks and there are scripts to convert Caffe models. However, I’ve not experimented converted Caffe models with these libraries.


If we need to compare Keras and Lasagne, Keras is more modular and hides all the details from the developer which reminds scikit-learn. Lasagne is more like a toolbox which you use to come up with more custom things.

I believe, these libraries are perfect for quick prototyping. Anything can be implemented in a flash of time wit keeping the details out of your view.


Caffe is the flagship of deep learning libraries for both industry and research. It is the first successful open-source implementation with very solid but simple foundation. You do not need to know code to use Caffe. You define your network with a description files and train it.

  1. Community Support: It has maybe the largest community. I believe anyone interested in deep learning would have some experience with it.  It has a large and old google users group and github issues pages that are full of information.
  2. Documentation: I always see that documentation is always a bit old compared to the current stage of the library. Even they do not have a extensive documentation page comparable to other libraries, you can always find tutorials and examples on web to learn more. It is a well-known library for Google search.
  3. Stability: It is really solid library. It uses well-known backend libraries for matrix operations and CUDA calls. I’ve not seen any problem yet. It is picky (lots of developer discussions, tests) to integrate new things therefore, it makes things more stable but in slower pace.
  4. Run-time performance: It is not the best but always acceptable. It uses well-founded libraries for any run-time crucial operations like convolution. It is bounded by these libraries. Custom solutions are akin to better run-times but they also degrade the stability in exchange. You can switch to CPU or GPU backend by a simple call without any change of your code.  It does well in terms of memory consumption but still too much compared to MxNet  and Torch (with OptNet) especially Inception type models. One problem is that, it does not support GPUs other than Nvidia ( I recently saw some branches supporting other platforms). It supports multi-gpu training on single machine but not distributed training.
  5. Flexibility: Learning to code with Caffe is not that hard but documentation is not helpful enough. You need to look to source code to understand what is happening and use present implementations to template your own code. After you understand the basics, it is easy to use and bend the library as you need. It has a good interface with Python and is compatible to new layers written with Python. It is a good library which hides the GPU and CPU integration from the developer. Caffe is very acceptable by the research community.
  6. Development: It has very broad developer support and many forks that target different applications but the master branch is so picky to something new. This is good to have a stable library but also causes this many forks. For instance, Batch Normalization is merged with the master branch after years of  discussion.
  7. Pre-trained models and examples:  Caffe model zoo is the heaven of pre-trained models for variety of domains and the collection keeps increasing. It has good set of example codes that can initiate you own project.


Caffe is the first successful deep learning library. It is stable, efficient and ready for deployment for any kind of projects. It is also the fastest library on CPU. Therefore, if you like to deploy your model on a CPU machine it is an run-time efficient choice. It is also good to integrate with Python when you like. It makes things easy when you need to deploy it to a Python based server or just play with the trained model. You can even able to train the model in Python.

One bad side of Caffe, unfortunately, it  targets mainly CNN types of models. Recently it shows inclination to RNN models but still it is really early stage.

Last Words

For fast development and getting results, Torch is my current choice with very good project base. It is easy to do complex things with its modular structure but you need to pass learning curve of Lua. If you don’t like to bother yourself with it, you might prefer Theano or simply and sub-framework like Keras. Keras is really easy especially for a beginner. It has many examples, blog posts and many people using it. Therefore, it is easy to learn, do new things. It is also possible to use both Theano and Tensorflow backend with Keras. It makes it so flexible.

MxNet is particularly a different framework. It is different then others with extensive interfacing to many different programming languages (R, Scala, Python etc.) However, this extensive coverage comes with a price that makes things sometimes complicated relative other counterparts. It is still a good choice for very large problems that you prefer to use distributed architectures over different machines.

Sometimes it is a huge bother to define large models by a model description file. It makes things very wobbling and akin to be mistaken. For example, you can miss a number of mistype it then your model crushes. finding such small problems over hundreds of lines is a huge bother.  In such cases, Python interface is wiser choice by defining some functions to create common layers.

NOTE: This is all my own experience with these libraries. Please correct me if you see something wrong or deceitful. Hope this helps to you. BEST 🙂