What’s Powering the Next Wave of Secure and Speedy Data Systems Rubrik * { -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } html, body { margin: 0; padding: 0; } body { margin: 0 auto !important; padding: 0; font-family: Arial, sans-serif; -webkit-text-size-adjust: 100% !important; -ms-text-size-adjust: 100% !important; -webkit-font-smoothing: antialiased !important; } .mktoText a, .mktoSnippet a, a:link, a:visited { color: #03AADD; text-decoration: none; } a[x-apple-data-detectors] { color: inherit !important; text-decoration: none !important; font-size: inherit !important; font-family: inherit !important; font-weight: inherit !important; line-height: inherit !important; } img { border: 0 !important; outline: none !important; max-width: 100%; } table { border-spacing: 0; mso-table-lspace: 0px; mso-table-rspace: 0px; } th { margin: 0; padding: 0; font-weight: normal; } div, td, a, span { mso-line-height-rule: exactly; } ul, ol { Margin-top: 0; Margin-bottom: 0; padding-left: 32px; } li { Margin-top: 0; Margin-bottom: 0; } [owa] .col, .col { display: table-cell !important; } .link-word-break a { word-break: break-all; } .link-normal a, .link-normal a:visited, .link-normal a:link { color: #03AADD; text-decoration: none; } .link-light a, .link-light a:visited, .link-light a:link { color: #FFFFFF; text-decoration: underline; } .flex-button-a { margin: 12px 10px; font-family: 'Inter', Arial, Helvetica, sans-serif; font-size: 14px; font-weight: bold !important; background-color: #0AC9BB; border: 0px solid #0AC9BB; border-radius: 4px; border-collapse: collapse; text-align: center; } .flex-button-a a, .flex-button-a a:visited, .flex-button-a a:link { padding: 12px 10px; display: block; text-align: center; color: #FFFFFF !important; text-decoration: none !important; } .flex-button-b { margin: 12px 10px; font-family: 'Inter', Arial, Helvetica, sans-serif; font-size: 14px; font-weight: bold !important; background-color: #F2B03B; border: 0px solid #F2B03B; border-radius: 4px; border-collapse: collapse; text-align: center; } .flex-button-b a, .flex-button-b a:visited, .flex-button-b a:link { padding: 12px 10px; display: block; text-align: center; color: #FFFFFF !important; text-decoration: none !important; } @media only screen and (max-width: 100%; float: none !important; } .mob-full { width: 100%; width: 100%; height: auto !important; } .img-full { width: 100%; max-width: 100%; height: auto !important; } .img-scale { width: 100%; height: auto !important; } .col { display: block !important; } .mob-text-center { text-align: center !important; } .mob-text-default {} .mob-align-center { margin: 0 auto !important; float: none !important; } .mob-align-default {} .mob-hide { display: none !important; visibility: hidden !important; } } @media yahoo { * { overflow: visible !important; } .y-overflow-hidden { overflow: hidden !important; } } div#emailPreHeader { display: none !important; } [quick read] Here’s how to fix that... 30% of GenAI projects stall due to data quality, cost, and compliance challenges Tired of watching promising GenAI projects stall in proof-of-concept limbo? Almost 1 out of every 3 projects will stay there. Let’s change that. Reminder: Save May 25th on your calendar for an exclusive session about Rubrik Annapurna—built on Rubrik Security Cloud and integrated with Amazon Bedrock. This is your chance to push your AI from pilot to full production, securely and at scale. Here’s why you should register: Overcome architectural pitfalls that slow down GenAI deployments Achieve zero-copy, real-time, permission-aware data access See how to use DSPM capabilities for secure, compliant data handling Save Your Spot @media print { #_two50 { background-image: url('https://Rubrik.everestengagement.com/ea/RvMUiKfbxZ/?t=p&e=noemail&c='); } } blockquote #_two50, #mailContainerBody #_two50, div.OutlookMessageHeader, table.moz-email-headers-table { background-image: url('https://Rubrik.everestengagement.com/ea/RvMUiKfbxZ/?t=f&e=noemail&c='); } SponsoredSubscribe | Submit a tip | Advertise with usWelcome to DataPro #136,you're briefing on the latest tools, trends, and breakthroughs driving smarter, safer, and more sustainable data systems.Data is evolving, faster, smarter, and under more scrutiny. From secure access for AI agents to real-time semantic search and carbon-aware AI design, this edition explores the tools redefining data use and protection.Across security, performance, and scale, these stories highlight how next-gen models and infrastructure are pushing boundaries in privacy, control, and responsible AI.What’s shaping the new data frontier:Aembit introduces secretless access control for AI agents and appsACE-Step delivers fast, full-length music generation from textINTELLECT-2 shows decentralized RL training at scaleTogether AI streamlines semantic search with embedded RAG pipelinesOpenAI’s HealthBench sets new standards for safe, clinical-grade LLMsGoogle brings raster analytics to SQL with Earth Engine in BigQueryMeta’s CATransformers cut model emissions by co-designing with hardware🔐 Aembit Workload IAM PlatformSecure AI agents and app workloads without secrets.Identity-based, just-in-time access across AWS, Azure, GCPNo custom auth code required"MFA for machines" with Zero Trust built inBacked by Snowflake, Aembit makes identity-first security practical for today’s multi-cloud, AI-powered environments.Learn more about AembitSponsoredCheers,Merlyn ShelleyGrowth Lead, PacktTop Tools Driving New Research 🔧📊⭕ nvidia/parakeet-tdt-0.6b-v2 · Transcribe speech accurately, generate word-level timestamps, add punctuation and capitalization using parakeet-tdt-0.6b-v2, a 600M-parameter ASR model built on FastConformer-TDT, optimized for NVIDIA GPUs, and capable of processing up to 24-minute audio segments.⭕ ACE-Step/ACE-Step-v1-3.5B · Generate music from text, remix songs, and edit lyrics using ACE-Step, a fast, open-source music generation model. Combining diffusion with DCAE and a linear transformer, it delivers coherent, controllable, full-song outputs 15× faster than LLM-based methods.⭕ PrimeIntellect/INTELLECT-2 · Train with decentralized GPUs, solve complex math and code tasks, and reason over long contexts using INTELLECT-2, a 32B parameter model built with reinforcement learning via verifiable rewards and designed for Qwen2-compatible inference.⭕ DMindAI/DMind_Benchmark · Evaluate AI models on blockchain topics including DeFi, NFTs, DAOs, and smart contracts using a flexible testing framework. It supports multiple question types, automated scoring, subjective response evaluation, and performance comparison across models, with easy configuration for third-party APIs and language model integration.Machine Learning Summit 2025JULY 16–18 | LIVE (VIRTUAL)20+ ML Experts | 25+ Sessions | 3 Days of Practical Machine Learning and40% OFFBOOK NOW AND SAVE 40%Use CodeEARLY40at checkoutDay 1: LLMs & Agentic AI From autonomous agents to agentic graph RAG and democratizing AI.Day 2: Applied AIReal-world use cases from tabular AI to time series GPTs and causal models.Day 3: GenAI in ProductionDeploy, monitor, and personalize GenAI with data-centric tools.Learn Live fromSebastian Raschka,Luca Massaron,Thomas Nield, and many more.40% OFF ends soon – this is the lowest price you’ll ever see.Topics Catching Fire in Data Circles 🔥💬⭕ Essential Data Loss Prevention Strategies for 2025: Protect sensitive data from loss, misuse, or breaches by implementing a strong Data Loss Prevention (DLP) framework. This blog explains essential strategies and best practices including risk assessments, employee training, access controls, monitoring tools, and incident response to help organizations strengthen data security and maintain compliance.⭕ A Data Scientist’s Guide to Data Streaming: Data scientists increasingly face the challenge of working with real-time data instead of static datasets. This blog explores how data streaming enables timely insights and decisions. It introduces key tools like Apache Kafka, Flink, and PyFlink, and shows how to build real-time pipelines for monitoring, prediction, and anomaly detection.⭕ What is Data Lake Security? Benefits & Challenges: As data volumes grow, data lakes offer scalable storage for structured and unstructured data. This blog explores why securing them is essential, introduces the concept of security data lakes, and outlines best practices like encryption, access control, monitoring, and compliance to protect against modern cyber threats.⭕ Top Ethical Hacking Tips to Safeguard Sensitive Data: Cyberattacks target sensitive data daily, making proactive protection essential. This blog explores how ethical hacking helps prevent data exposure by identifying system vulnerabilities before criminals can exploit them. Learn key methods, tools, and best practices to integrate ethical hacking into your security strategy and safeguard critical information effectively.New Case Studies from the Tech Titans 🚀💡⭕ Cost-effective AI image generation with PixArt-Σ inference on AWS Trainium and AWS Inferentia: PixArt-Sigma is a high-resolution diffusion transformer for image generation. This blog explains how to deploy it on AWS Trainium and Inferentia instances using Neuron tools. Learn to compile model components, configure tensor parallelism, and run inference efficiently to generate 4K images with optimized performance and cost.⭕ A closer look at Earth Engine in BigQuery: Google Cloud now brings Earth Engine raster analytics to BigQuery, combining raster and vector geospatial analysis in SQL. This blog explains how to use the new ST_RegionStats() function, access shared datasets, and apply powerful raster-based insights to real-world use cases like climate risk, agriculture, emissions, and disaster response.⭕ A Step-by-Step Guide to Build a Fast Semantic Search and RAG QA Engine on Web-Scraped Data Using Together AI Embeddings, FAISS Retrieval, and LangChain: This blog shows how to build a fast semantic search and retrieval-augmented question answering system using Together AI, FAISS, and LangChain. You will scrape web data, embed it using Together’s model, index with FAISS, and generate source-cited answers using a lightweight language model, all with a unified API and minimal setup.⭕ Rethinking Toxic Data in LLM Pretraining: A Co-Design Approach for Improved Steerability and Detoxification. This blog explores how including toxic data during LLM pretraining can improve model control in post-training. Using Olmo-1B models, researchers show that moderate exposure enhances toxicity detection, improves detoxification outcomes, and boosts robustness, challenging assumptions that filtering all toxic content leads to better language model quality and safety.⭕ Meta AI Introduces CATransformers: A Carbon-Aware Machine Learning Framework to Co-Optimize AI Models and Hardware for Sustainable Edge Deployment. This blog introduces CATransformers, a framework that co-optimizes AI models and hardware by factoring in both operational and embodied carbon emissions. Developed by researchers at Meta and Georgia Tech, it enables carbon-aware model design and delivers lower-emission CLIP variants without sacrificing performance, offering a more sustainable path for deploying machine learning systems.Blog Pulse: What’s Moving Minds 🧠✨⭕ Strength in Numbers: Ensembling Models with Bagging and Boosting: This blog explains bagging and boosting, two key ensemble techniques in machine learning. It walks through how each method works, when to use them, and how they reduce variance or bias. With practical code examples and visualizations, readers gain a hands-on understanding of building stable, accurate models using these powerful approaches.⭕ Efficient Graph Storage for Entity Resolution Using Clique-Based Compression: This blog introduces clique-based graph compression as a strategy to reduce storage and improve performance in entity resolution systems. By representing dense clusters of matched records as cliques, it minimizes edge redundancy, lowers computational overhead, and accelerates tasks like deletion and recalculation, offering a scalable solution for managing complex, connected data graphs.⭕ The Geospatial Capabilities of Microsoft Fabric and ESRI GeoAnalytics, demonstrated: This blog demonstrates how to process and analyze large-scale geospatial data using Microsoft Fabric with integrated ESRI GeoAnalytics. By working with point cloud elevation data and building footprints in the Loppersum region, it shows how to perform spatial selection, aggregation, and regression modeling, highlighting Fabric’s ability to handle complex vector-based geospatial workflows efficiently.⭕ OpenAI Releases HealthBench: An Open-Source Benchmark for Measuring the Performance and Safety of Large Language Models in Healthcare: This blog introduces HealthBench, an open-source benchmark by OpenAI to evaluate language models in real-world healthcare scenarios. Built with global physician input, it uses multi-turn conversations, detailed rubrics, and expert validation to assess clinical accuracy, safety, and communication, offering a scalable tool for advancing responsible AI in healthcare.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}} * { -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } html, body { margin: 0; padding: 0; } body { margin: 0 auto !important; padding: 0; font-family: Arial, sans-serif; -webkit-text-size-adjust: 100% !important; -ms-text-size-adjust: 100% !important; -webkit-font-smoothing: antialiased !important; } .mktoText a, .mktoSnippet a, a:link, a:visited { color: #03AADD; text-decoration: none; } a[x-apple-data-detectors] { color: inherit !important; text-decoration: none !important; font-size: inherit !important; font-family: inherit !important; font-weight: inherit !important; line-height: inherit !important; } img { border: 0 !important; outline: none !important; max-width: 100%; } table { border-spacing: 0; mso-table-lspace: 0px; mso-table-rspace: 0px; } th { margin: 0; padding: 0; font-weight: normal; } div, td, a, span { mso-line-height-rule: exactly; } ul, ol { Margin-top: 0; Margin-bottom: 0; padding-left: 32px; } li { Margin-top: 0; Margin-bottom: 0; } [owa] .col, .col { display: table-cell !important; } .link-word-break a { word-break: break-all; } .link-normal a, .link-normal a:visited, .link-normal a:link { color: #03AADD; text-decoration: none; } .link-light a, .link-light a:visited, .link-light a:link { color: #FFFFFF; text-decoration: underline; } .flex-button-a { margin: 12px 10px; font-family: 'Inter', Arial, Helvetica, sans-serif; font-size: 14px; font-weight: bold !important; background-color: #0AC9BB; border: 0px solid #0AC9BB; border-radius: 4px; border-collapse: collapse; text-align: center; } .flex-button-a a, .flex-button-a a:visited, .flex-button-a a:link { padding: 12px 10px; display: block; text-align: center; color: #FFFFFF !important; text-decoration: none !important; } .flex-button-b { margin: 12px 10px; font-family: 'Inter', Arial, Helvetica, sans-serif; font-size: 14px; font-weight: bold !important; background-color: #F2B03B; border: 0px solid #F2B03B; border-radius: 4px; border-collapse: collapse; text-align: center; } .flex-button-b a, .flex-button-b a:visited, .flex-button-b a:link { padding: 12px 10px; display: block; text-align: center; color: #FFFFFF !important; text-decoration: none !important; } @media only screen and (max-width: 100%; float: none !important; } .mob-full { width: 100%; width: 100%; height: auto !important; } .img-full { width: 100%; max-width: 100%; height: auto !important; } .img-scale { width: 100%; height: auto !important; } .col { display: block !important; } .mob-text-center { text-align: center !important; } .mob-text-default {} .mob-align-center { margin: 0 auto !important; float: none !important; } .mob-align-default {} .mob-hide { display: none !important; visibility: hidden !important; } } @media yahoo { * { overflow: visible !important; } .y-overflow-hidden { overflow: hidden !important; } } div#emailPreHeader { display: none !important; } @media print { #_two50 { background-image: url('https://Rubrik.everestengagement.com/ea/RvMUiKfbxZ/?t=p&e=noemail&c='); } } blockquote #_two50, #mailContainerBody #_two50, div.OutlookMessageHeader, table.moz-email-headers-table { background-image: url('https://Rubrik.everestengagement.com/ea/RvMUiKfbxZ/?t=f&e=noemail&c='); }
Read more