Stanford HAI Releases 2024 AI Index Report (Website)
The Stanford Institute for Human-Centered AI has released its seventh annual AI Index report. This year's report covers the rise of multimodal foundation models, major cash investments into generative AI, new performance benchmarks, shifting global opinions, and new major regulations.
Apple iOS 18 Will Be On-Device (1 minute read)
Apple's upcoming AI features in iOS 18 are rumored to focus on privacy, with the initial set of enhancements functioning entirely on-device without the need for an internet connection or cloud-based processing, thanks to the company's in-house large language model known internally as "Ajax."
π§
Research & Innovation
Compression represents intelligence linearly (18 minute read)
Most modern AI is built around the idea of compressing a training dataset into a model. The better the compression, the better the model. This paper shows that relation rigorously and posits that scale benchmark scores correlate strongly to a model's ability to compress novel text.
Feedback in Transformers (24 minute read)
TransformerFAM provides a feedback mechanism that allows Transformers to attend to their own latent representations. This can, in theory, introduce recurrence into the model for processing extremely long inputs in context.
π¨βπ»
Engineering & Resources
Enhanced Vision-Language Model (GitHub Repo)
Vision Language Models (vLLMs) often struggle with processing multiple queries per image and identifying when objects are absent. This study introduces a new query format to tackle these issues, and incorporates semantic segmentation into the training process.
Road Line Segmentation for Autonomous Driving (16 minute read)
Accurately segmenting road lines and markings is crucial for autonomous driving but challenging due to occlusions caused by vehicles, shadows, and glare. The Homography Guided Fusion (HomoFusion) module uses video frames to identify and classify obscured road lines by leveraging a novel surface normal estimator and a pixel-to-pixel attention mechanism.
Qwen Coder (12 minute read)
Code Qwen 1.5 is a new set of 7B models trained on 3T tokens of code related data. It performs well on HumanEval, with a non-zero score on SWE-bench. The chat variant specifically shows promise for long context retrieval tasks up to 64k tokens.
1-bit Quantization (7 minute read)
Extreme low-bit quantization for small pre-trained models, like Llama2-7B, is challenging, but fine-tuning just 0.65% of parameters significantly improves performance. Newly fine-tuned 1-bit models outperform 2-bit Quip# models, while 2-bit models with specialized data can exceed full-precision counterparts. This research suggests that proper fine-tuning and quantization may enhance efficiency without compromising model quality, potentially shifting focus from training smaller models to optimizing larger, quantized ones.
Accelerating AI: Harnessing Intel(R) Gaudi(R) 3 with Ray 2.10 (5 minute read)
Anyscale's latest release of Ray, Ray 2.10, adds support for Intel Gaudi 3. Developers can now spin up and manage their own Ray Clusters, provisioning Ray Core Task and Actors on a Gaudi fleet directly through Ray Core APIs, tap into Ray Serve on Gaudi through Ray Serve APIs for a higher level experience, and configure Intel Gaudi accelerator infrastructure for use at the Ray Train layer.