👉 Subscribe to my Substack to get the latest news and articles.
We recently launched XTTS, and the response from the community has been overwhelmingly positive. We have received enthusiastic feedback online, with people expressing their excitement for the model. 🐸TTS remained the highest-ranked repository on Github for nearly five days, and we received numerous stars on both Github and HF spaces. I would like to extend my gratitude to the HF team for their assistance with the release.
If you have not had a chance to try XTTS yet, I highly recommend giving it a go. It is the most advanced open-source text-to-speech model we have released to date.
Deepfakes of Chinese influencers are livestreaming 24/7
Agility Robotics aims to establish a factory capable of manufacturing 10,000 humanoid robots annually.
“edX Survey Finds Nearly Half (49%) of CEOs Believe Most or All of Their Role Should be Automated or Replaced by AI”
World’s first analog computed is recreated by Lego
OpenAI released Dall-E 3
Stability AI released Stable Audio. It is a text to audio model that can generate music or sound.
Fast-feedforward Networks #
The introduction of the fast feedforward (FFF) architecture proposes an alternative for the way layer size and inference cost are connected in neural networks. This architecture provides a logarithmic-time option as an alternative to the conventional feedforward networks.. With impressive speed improvements, FFFs outperform feedforward networks by up to 220 times and are even 6 times faster than mixture-of-experts networks. Furthermore, FFFs exhibit superior training properties compared to mixtures of experts, thanks to their noiseless conditional execution. Remarkably, FFFs can function with as little as 1% of layer neurons in vision transformers, while still maintaining an impressive 94.2% predictive performance. This work not only enhances inference speed but also provides a more efficient and effective alternative to conventional feedforward and mixture-of-experts networks.
The Reversal Curse: LLMs trained on A=B fail to learn B=A #
In this work, researchers have uncovered a surprising limitation in auto-regressive large language models (LLMs) when it comes to generalization. They found that if a model is trained on the sentence structure “A is B”, it does not automatically extend its understanding to the reverse structure “B is A”. They refer to this limitation as the Reversal Curse. For example, even if a model is trained on the statement “Olaf Scholz was the ninth Chancellor of Germany”, it will not be able to correctly answer the question “Who was the ninth Chancellor of Germany?” without additional training. Strikingly, the model’s likelihood of providing the correct answer (“Olaf Scholz”) is not higher than that of a random name, indicating a failure of logical deduction. The researchers observed this failure across different model sizes and families and found that data augmentation did not alleviate the issue. The study also evaluated models like GPT-3.5 and GPT-4 using
Long Lora: Efficient Fine-tuning of Long-Context Large Language Models #
👉 Paper 👉 Code
LongLoRA is an innovative approach that aims to expand the context sizes of pre-trained large language models (LLMs) using minimal computational resources. Traditionally, training LLMs with longer context sizes has been time-consuming and resource-intensive, requiring a significant amount of training hours and GPU resources. For example, increasing the context length from 2048 to 8192 would result in a sixteen-fold increase in computational costs for self-attention layers.
The first advancement involves using sparse local attention instead of dense global attention during fine-tuning. By implementing this shift in attention technique, they can successfully expand the context while reducing computation. This approach achieves comparable performance to fine-tuning with vanilla attention, and it only requires adding two lines of code to the training process. It is also optional during inference, providing flexibility.
The second advancement focuses on a parameter-efficient fine-tuning regime for context expansion, specifically highlighting the importance of trainable embedding and normalization. Emphasizing these aspects can further enhance the performance of LongLoRA. It has been demonstrated that LongLoRA delivers impressive results across various tasks.
vLLM: An Open-Source Machine Learning Library for Fast LLM Inference and Serving
In order to improve LLM serving performance, vLLM uses PagedAttention, which draws inspiration from virtual memory and paging methods used in operating systems. PagedAttention handles the key-value cache (KV cache) memory for requests in an efficient manner, reducing fragmentation and redundancy. By using this solution, wastage of KV cache is virtually eliminated and requests can easily share resources, resulting in a significant increase of up to 24 times in throughput compared to current models.
PagedAttention splits the KV cache into smaller segments to enable storing memory in a non-continuous manner. This leads to only a 4% inefficiency in memory usage. Moreover, it greatly reduces the memory demands of sampling techniques such as parallel sampling and beam search. This results in a speed improvement of up to 2.2 times and a reduction in memory usage of up to 55%. The vLLM exhibits significant enhancements in throughput for popular LLMs, especially when dealing with larger models and complex decoding algorithms.
DSPy is a framework that shares similarities with Langchain. It provides a means to create LM pipelines for problem-solving. Many individuals who find Langchain to be complex have found DSPy to be a more favorable alternative. “DSPy is the framework for solving advanced tasks with language models (LMs) and retrieval models (RMs). DSPy unifies techniques for prompting and fine-tuning LMs — and approaches for reasoning and augmentation with retrieval and tools. All of these are expressed through modules that compose and learn.”
“ChatDev stands as a virtual software company that operates through various intelligent agents holding different roles, including Chief Executive Officer , Chief Product Officer , Chief Technology Officer, programmer, reviewer, tester, art designer. These agents form a multi-agent organizational structure and are united by a mission to “revolutionize the digital world through programming.” The agents within ChatDev collaborate by participating in specialized functional seminars, including tasks such as designing, coding, testing, and documenting.”