👉 Subscribe to my Substack to get the latest news and articles.
Mojo is out. #
👉 Blog post
Mojo, a programming language designed specifically for AI developers, is expected to evolve into a superset of Python. Currently, it seamlessly integrates with any Python code and offers a scalable programming model that caters to performance-sensitive systems, including widely used accelerators like GPUs in the field of AI.
Mojo was in private beta for some time, but now it has finally been released to the public. I’m really looking forward to seeing if it lives up to its hype of being 68,000 times faster than Python.
“China’s chip breakthrough” #
Huawei made 7nm chips for their recetnyl released phones. The Mate 60 Pro, manufactured by Huawei, is powered by the newly developed Kirin 9000s processor, which is made in China by Semiconductor Manufacturing International Corp (SMIC).
This development marks a significant milestone for China. While these chips may not be as advanced as Qualcomm’s recent chips, they represent a promising start in chip manufacturing. There are rumors that Huawei intends to enter the Chinese market as a direct competitor to Qualcomm. It is said that Huawei plans to sell its own chips to other manufacturers, which has prompted Qualcomm to lower its prices in order to discourage potential competition.
Cars pose significant privacy concerns. #
As per a recent report by the Mozilla Foundation, cars have been labeled as the “official worst” when it comes to privacy among all product categories reviewed. The nonprofit organization discovered that a staggering 92 percent of the automakers analyzed offer drivers minimal to no control over their personal data, and 84 percent of them share user data with external entities.
Theory of Ming for LLMs #
This paper explores the Theory of Mind (ToM) in large language models (LLMs) and reveals a surprising resemblance between the neural activity of dorsal medial prefrontal cortex neurons in humans and hidden embeddings in LLMs, suggesting their ability to represent another’s perspective. The study used a task material composed of 76 trials, including true-belief and false-belief trials, and two categories of questions to test the ToM capability of LLMs: a fact question and an other-belief question. The findings have implications for the development of artificial intelligence and our understanding of human cognition.
One Wide Feedforward is All You Need #
In the Transformer model, the FFN (Feed-Forward Network) parameters are shown to be redundant and can be optimized to improve the efficiency of the model. To achieve this, the FFN on the decoder can be removed, reducing the computational overhead. Additionally, to further save computation, the encoder can share a single FFN. Although this optimization may result in a slight drop in accuracy, the model can be scaled back to its original size to regain the lost accuracy while improving both accuracy and latency.
Large Language Models as Optimizers #
This paper introduces a new method called Optimization by PROmpting (OPRO) that addresses the challenge of applying derivative-based algorithms to problems without gradients. OPRO leverages large language models (LLMs) as optimizers, providing an effective solution. The optimization process involves describing the task using natural language, prompting the LLM to generate new solutions. These solutions are then evaluated and incorporated into the prompt for subsequent optimization steps. Our experiments demonstrate the effectiveness of OPRO in handling linear regression, traveling salesman problems, and prompt optimization tasks. Notably, the optimized prompts generated by OPRO outperform human-designed prompts by up to 8% on GSM8K and up to 50% on Big-Bench Hard tasks, utilizing various LLMs.
A Survey on Large Language Model based Autonomous Agents #
This paper is a comprehensive survey on Large Language Model (LLM) based Autonomous Agents. It presents a unified framework for constructing LLM-based autonomous agents and explores their diverse applications in social science, natural science, and engineering. The paper also summarizes existing studies on LLM-based autonomous agents and provides a curated list of studies with diverse agent categories. The authors discuss the remarkable potential of LLMs in achieving human-level intelligence and highlight the common evaluation strategies used for LLM-based autonomous agents. Overall, this paper provides a holistic perspective on the field of LLM-based autonomous agents and is a valuable resource for researchers and practitioners in the field.
The Medusa framework simplifies the use of acceleration techniques for generating Language Model (LLM) with multiple decoding heads. This is done by incorporating additional heads into the LLM, which have the ability to predict several future tokens simultaneously. The core model itself remains unaltered, with only the new heads being adjusted during the training process. During text generation, these heads generate multiple potential words for each position, and these generated words are further processed and combined using a tree-based attention mechanism. Finally, a traditional selection method is applied to choose the most probable prefix from the candidates, which is then used for further decoding.
The “tinygrad” framework is a compact and straightforward deep-learning tool that aims to be minimalistic. Its main purpose is to offer a user-friendly platform for integrating new accelerators, supporting both inference and training. While XLA can be seen as a CISC approach, tinygrad takes on a RISC approach.