Tech Beats-1

machine-learning, mlops, newsletter, news, research

(This is the lazier version of my substack.)

A group of robots working together in a robot factory, French, Rococo style, classical style, oil on canvas Hi everyone. This is the first newsletter send-out… What’s this: This newsletter will be sharing my notes about ML in a (hopefully) more organized manner. Why yet another newsletter: I read and think about ML, AI, and Tech and take notes to myself. ML is also my job. I train ML models for many different tasks and domains. Also, I used to have a blog (nuked by Digital Ocean) and I miss it. So I decided to start writing again. But I’m also lazy, so it was easier to reformat my notes and publish them. Maybe there is enough space for one more newsletter. Who knows. The content: Mostly ML research, a bit of open-source, rare tech news, and no “This is Game Changer” buzz (maybe some?). How regular: I don’t trust myself to be very consistent, but I’ll try to prepare it every other week. Contact me: You can find me on Twitter (or X whatever), LinkedIn, GitHub and see more content on my home page. So let’s dive in… Thanks for reading Machine Learns! Subscribe for free to receive new posts and support my work.

So let’s dive in

Bits & Pieces #

No GIL Python. Maybe ?? #

**AI2 Dolma: 3 Trillion Token Open Corpus for Language Model Pretraining** #

👉 Data
👉 Blogpost
👉 Datasheet
👉 Code

Allen Institute for AI has released a new dataset for pre-training language models. They also open-sourced the code that they used to format this dataset. Data is under the AI 2 ImPACT license. I don’t know if it is a coincidence but Dolma is also one of my favorite dishes.

Research #

RRHF:Align LLMS with Human Feedback using Ranking #

👉 Paper
👉 Code

Bayesian Flow Nets #

👉 Paper

This is the new paper from Alex Graves’ (one of the founders of Attention) after 5 years of silence. This paper proposes a new generative model that can work with discrete and continuous data.

We’ll see if it will take another 5 years for someone to create the next level of AI based on his work 😄.

Instruction Back Translation #

👉 Paper

SpeechX - One model for many audio tasks. #

👉 Project page
👉 Paper

SpeechX is a new model from Microsoft that can perform multiple speech tasks. This is my field I go deeper about this one.

What’s new: SpeechX is an audio language model that is trained to do noise suppression, speaker removal, target speech extraction, zero-shot text-to-speech, and speech editing.

How it works:

Key insights: It is interesting to see that we can perform a variety of tasks with the same model by just changing small bits in the input. Of course, with enough data.

Results: Looking at the results SpeechX is out-performing or competitive with the expert models. However, listening to the TTS samples (this is my field), I’d tell there is more to be desired in terms of audio quality which can be addressed by a better audio encoding model. (EnCoded is not the best for speech in my experience)

We also see the effect of transfer learning. The model performs better when it is initialized from a pre-trained VallE model that is initially trained for TTS.

More reads #

Open Source #

Interspeech 2023 papers #

👉 Repo

Interspeech 2023 is happening or happened recently. It is one of the most important conferences in audio and speech tech. In the repo, they organized the papers into different topics and give links to the papers and codes. Shout out to 𝙳𝚖𝚒𝚝𝚛𝚢 𝚁𝚢𝚞𝚖𝚒𝚗 and the contributors.

HuggingFace’s Torch Replacement in Rust (Candle) #

👉 Code

Huggingface released a new library called Candle which ports Torch kernels to Rust. It is quite early in development but got much attention in a short time.

Rust is awesome but it misses a solid ML framework. Either Python will get rid of GIL or Rust will get a solid ML framework. I’m betting on Rust.

🐸 CoquiTTS - Text-to-Speech in >1100 languages. #

👉 Code
👉 Docs

🐸TTS is the library, I spend years developing it. Started when I was at Mozilla then forked it when I co-founded It is like Transformers but for TTS. Many different model implementations, utilities to train new models, and pre-trained models that are downloaded >1m monthly. So if you need a voice your thing, give this a try.

Diffiner - Sony #

👉 Code
👉 Paper

This is a speech enhancement model based on a diffusion model. They provide a pre-trained model and training code.

Quivr - Second Brain with GenAI #

👉 Code

“Quivr, your second brain, utilizes the power of GenerativeAI to store and retrieve unstructured information. Think of it as Obsidian, but turbocharged with AI capabilities.”

Basically, Quivr helps you store any media content and create a database that you can chat with. This is a very-well thought project with the Apache 2.0 license.

Extras #

Debate on AI - Bengio, Tegmark vs Mitchell, Lecun #

👉 Youtube

The debate about “Existential Risk of AI”. It’s a long bit but worth it if you are into the discussion.

A Talk about Unsupervised Learning #

👉 Youtube

An interesting talk from one of the ChatGPT creators Ilya Sutskever (OpenAI). He gives his own perspective on unsupervised learning and draws some parallels with data compression, and Kolmogorov complexity for the theory of unsupervised learning.

Baldusgate 3 🎮 #

I play games and probably you too. This is big if you are into role-playing games. And probably I’m playing it as you read these lines. 

Just imagine the next-gen game with NPCs connected to ChatGPT…