TLDR DevOps 2024-05-03

Git 2.45 📜, Supercharged Developer Portals 🦸, Monitoring ML Models 🪄

📱
News & Trends

Supercharged Developer Portals (5 minute read)

Spotify has introduced new products and services for companies adopting Backstage, its open-source framework for building internal developer portals, with the aim of enhancing developer experience and productivity.

Highlights from Git 2.45 (5 minute read)

Git 2.45 has just been released. Some features include experimental support for reftables and SHA-256 interoperability.
🚀
Opinions & Tutorials

Automating the Klarna Card Ownership Fees System using AWS Step Functions (8 minute read)

In early 2023, Klarna introduced monthly fees for Klarna Cards in the US. This was managed by a manually intensive system developed by two teams. Over several months, the system saw little improvement or automation. Recognizing the unsustainability of the approach, Klarna introduced a new initiative that used AWS Step Functions to automate the process. It orchestrated multiple AWS services into serverless workflows, which significantly reduced maintenance burden and improved system reliability.

Best practices for monitoring ML models in production (7 minute read)

This article outlines strategies for machine learning monitoring model performance, including evaluating prediction accuracy, detecting drift, and addressing data processing pipeline issues, emphasizing the importance of continuous monitoring to ensure stable and performant ML-powered services.

Grafana Cloud Synthetic Monitoring: How to simulate user journeys to ensure the best possible end-user experience (11 minute read)

Grafana Labs has unveiled new Synthetic Monitoring capabilities in Grafana Cloud, providing tools to simulate complex transactions and user journeys and enhance the user experience.
🎁
Miscellaneous

From Blocky to Brilliant: Improving Video Quality on Discord Go Live on AMD GPUs (5 minute read)

This article details Discord's journey in improving the visual quality and performance of the Go Live feature. It covers how to tackle issues with key frames, low-quality key frames, and frame dropping.

Fine-tuning AWS ASGs with Attribute Based Instance Selection (6 minute read)

This is the latest blog post in a series on autoscaling infrastructure, which discusses the transition from using Clusterman to Karpenter. The post highlights the adoption of attribute-based instance selection in AWS Auto Scaling Groups. This approach has significantly improved infrastructure reliability and cost-effectiveness by allowing dynamic, attribute-based selection of EC2 instances, reducing operational overhead and facilitating a smoother migration to AWS Karpenter.
⚡️
Quick Links

"run0" as a sudo replacement (1 minute read)

run0, a sudo replacement that will be introduced in systemd 256, eliminates setuid permissions for privileged commands, aiming for a security approach more suited to 2024 standards.

Early explorations and practices of Xline, a stateful application managed by Karmada (8 minute read)

This document discusses the challenges and opportunities of managing stateful applications across multiple clusters using Karmada and considers how to improve multi-cloud and multi-cluster management.

Introducing the Kubernetes agent for Octopus (4 minute read)

Octopus introduced a new deployment target, Kubernetes agent, to streamline, expedite, and enhance the safety of deployments to Kubernetes.
Get our free daily newsletter with curated tools 💻, trends 📈, and insights 💡, for DevOps Engineers 👨‍💻
Join 200,000 readers for