AIGC

Snorkel AI 深入探討透過生成式 AI 進行資料標記的方法

Snorkel AI extends data curation beyond labeling for Generative AISnorkel AI has announced new capabilities to help organizations curate and prepare d .... (往下繼續閱讀)

分享到 Facebook 分享到 Line 分享到 Twitter

文章目錄

Snorkel AI 深入探討透過生成式 AI 進行資料標記的方法

Snorkel AI extends data curation beyond labeling for Generative AI

Snorkel AI has announced new capabilities to help organizations curate and prepare data for Generative AI, shifting beyond its primary function of providing data labeling services for machine learning (ML) and artificial intelligence (AI). According to VentureBeat, Snorkel AI is a data platform that assists organizations with the data aspect of AI. Although data labeling remains important for predictive AI tasks, CEO and co-founder at Snorkel AI, Alex Ratner, said that in the long run, he expects much of the enterprise value from AI to come from more traditional predictive AI. As for generative AI, there is still a need for feedback, and Snorkel AI's new tools are designed to help organizations assemble, curate, and develop feedback programatically, accelerated and better managed.

The role of data labeling

Data labeling has long been a critical component in helping data scientists prepare data for ML and AI. In November 2022, Snorkel Flow technology was updated with features that enable organizations to accelerate the data labeling process, using large language models (LLMs) to get started. The new GenFlow service goes one step further, building generative AI applications while the Snorkel Foundry helps organizations build customized LLMs.

Ensuring good data for Generative AI

“How you curate, sample, filter, and clean data ends up having a tremendous impact on the resulting foundation model that you get out,” Ratner said in an exclusive interview with VentureBeat. One issue that generalized generative AI tools face is the risk of hallucination, where responses are not accurate. To address this issue, multiple vendors are exploring the concept of Retrieval Augmented Generation (RAG), where sources for generating results are cited. However, if there are no sources, it becomes a data problem that Snorkel Foundry can solve with its data curation capabilities. With Snorkel Foundry, organizations can point the service at a data repository to get the right mix of data to meet business objectives, reduce bias, and the risk of hallucination.

Beyond labeling with GenFlow

After pre-training an LLM, the next step is to fine-tune it to generate an optimal output. For non-generative AI, such as Snorkel Flow, classifying data with tags helps label it properly. However, for generative AI outputs, traditional labeling is not what's needed. This is where the GenFlow service comes in – it provides the right tooling and management capability to provide feedback and filter out poor-quality data points to help generative AI generate an optimal output.

Advice for enterprises

Ratner believes that the majority of data organizations have will likely be unstructured. The Snorkel Foundry includes data sampling functions that enable users to heuristically identify data relevance and compose the right balance of content to put into an ML training routine. Ratner explained that "most enterprises don't have perfectly curated data," and Snorkel AI is helping them do that programmatically to organize, curate, and optimize the mixture of data.

Final thoughts

Generative AI has brought a new challenge to data curation with the risk of hallucination; Snorkel AI's new tools aim to tackle this issue by asking for feedback in different forms. Enterprises must ensure that they have good data for generative AI and an optimal mix of data for ML training routines. By using Snorkel AI's tools, organizations can assemble, curate, and develop feedback programatically. In other words, they can integrate and optimize AI investments for success.

Artificial Intelligence-SnorkelAI,生成式 AI,資料標記,深度學習,自然語言處理,機器學習

程宇肖

程宇肖

Reporter

大家好!我是程宇肖,我對於科技的發展和應用有著濃厚的興趣,並致力於將最新的科技趨勢和創新帶給大家。科技領域的變化速度驚人,每天都有令人興奮的新發現和突破。作為一名部落格作者,我將帶領大家深入探索科技的奧秘和應用的無限可能。