علماء الذكاء الاصطناعي القابلون للتدريب: NVIDIA NeMo Gym وNeMo RL لأتمتة الاكتشاف العلمي

في مجال البحث العلمي، يمكن للطبيعة المملة والمتكررة للمهام مثل مراجعة الأدبيات وإدارة التجارب والتعامل مع البيانات أن تعيق التقدم بشكل كبير. تتعمق منشور المدونة هذا في كيفية قيام أطري عمل NVIDIA NeMo Gym وNeMo RL، جنبًا إلى جنب مع عمل Edison Scientific وبيئة Aviary الخاصة بهم، بإحداث ثورة في هذا المجال من خلال تمكين إنشاء وتدريب وكلاء علميين مدعومين بالذكاء الاصطناعي. يمكن لهؤلاء الوكلاء أتمتة العديد من المهام المستهلكة للوقت، مما يحرر الباحثين للتركيز على حل المشكلات الإبداعي والاكتشاف العلمي.

التعلم المعزز: توسيع قدرات LLM للعلوم

تتفوق نماذج تعلم اللغة التقليدية (LLMs) في التنبؤ بالرمز المميز التالي، مما يؤدي إلى معرفة واسعة ولكنها غالبًا ما تفتقر إلى المهارات الخاصة بمجال معين. يعمل التعلم المعزز (RL) على سد هذه الفجوة من خلال السماح للنماذج بالتفكير والتصرف خارج نطاق البيانات الخاضعة للإشراف. تسلط المشاركة الضوء على الجوانب الرئيسية التالية:

التدريب المسبق: يوفر فهمًا أساسيًا ولكنه يفتقر إلى الخبرة الخاصة بالمجال.
الضبط الدقيق الخاضع للإشراف (SFT): يتعلم من أزواج التعليمات والاستجابة ولكنه مقيد بتغطية مجموعة البيانات ويكافئ فقط إعادة إنتاج الإجابة المرجعية.
التعلم المعزز (RL): يستخدم دالة المكافأة لتسجيل مخرجات، مما يمكّن النماذج من التحسين لتحقيق أهداف محددة.
- RLHF (التعلم المعزز من ملاحظات الإنسان): يعتمد على تفضيلات الإنسان للحصول على إشارات المكافأة.
- RLAIF (التعلم المعزز من ملاحظات الذكاء الاصطناعي): يستخدم LLMs كقضاة.
- RLVR (التعلم المعزز مع المكافآت القابلة للتحقق): يستخدم الفحوصات الحسابية للحصول على إشارات مكافأة موضوعية، وهي ضرورية للمهام العلمية.
RL العلمي: يمكّن الوكلاء من تصميم وإجراء التجارب وتقييم النتائج والتحسين لتحقيق المقاييس العلمية.

NeMo Gym وNeMo RL: تحسين التدريب Agentic

يوفر تكامل NeMo Gym وNeMo RL إطارًا قويًا لبناء وتقييم وكلاء LLM للبحث العلمي.

NeMo RL: يوفر خوارزميات التدريب ويدير موارد الحساب وينظم تحديثات النموذج. يدعم أحدث إصدار ميزات مثل تقطير السياسة على السياسة، وasyncRL، وخوارزميات RL المتقدمة، وتدريب FP8 RL من طرف إلى طرف.
NeMo Gym: إطار عمل مفتوح المصدر لبناء بيئات تدريب RL على نطاق واسع، والتعامل مع التبعيات المتنوعة والمتطلبات الخاصة بالمجال.
- يقدم ثلاثة تجريدات خادم أساسية:
  - النموذج: يلتف حول نقاط نهاية متوافقة مع OpenAI لتقديم الدعم في مجال التفكير واستدعاء الأدوات.
  - الموارد: يوفر تطبيقات الأدوات ومنطق التحقق.
  - الوكلاء: ينظم التفاعلات بين النماذج والموارد.
الوظائف الرئيسية: يقوم NeMo Gym بإنشاء عمليات طرح ومكافآت، مما يسمح بالتدريب القابل للتطوير والتكامل مع الأنظمة الحالية.

Edison Scientific وAviary: مثال عملي

تستخدم Edison Scientific NeMo Gym وNeMo RL مع إطار Aviary الخاص بها لأتمتة الاكتشاف العلمي في مجالات علم الأحياء والكيمياء والمجالات ذات الصلة. توفر Aviary بيئات تدريب RL لمهام مختلفة، بما في ذلك البحث في الأدبيات وتحليل بيانات المعلوماتية الحيوية والاستنساخ الجزيئي.

طرق Aviary الأساسية:
- reset(): يقوم بتهيئة البيئة وإرجاع الملاحظة الأولى.
- step(): يقوم بتنفيذ إجراء وإرجاع ملاحظات جديدة ومكافآت وإشارات إنهاء.
حالة الاستخدام على سبيل المثال: تقوم Edison Scientific بتدريب وكيل تحليل بيانات دفتر ملاحظات Jupyter لمهام المعلوماتية الحيوية. لقد قاموا بتنفيذ ميزات إدارة السياق للتعامل مع دفاتر الملاحظات الكبيرة، بما في ذلك إسقاط محفوظات التفاعل وتجميع GRPO على خطوات فردية.
BixBench: معيار قياس للأسئلة القابلة للتحقق في مجال المعلوماتية الحيوية تم إنشاؤه بواسطة Edison Scientific لاختبار نظامهم والتحقق من صحته.

بناء بيئات Agentic باستخدام NeMo Gym

تقدم مدونة النشر دليلًا تفصيليًا لبناء بيئات agentic في NeMo Gym. تتضمن الخطوات الرئيسية ما يلي:

تثبيت NeMo Gym: استنساخ المستودع وإعداد بيئة افتراضية.
تكوين النموذج: استخدام نموذج مستضاف أو نشره محليًا باستخدام vLLM، وتمكين استدعاء الأدوات.
اختبار بيئة Aviary: تشغيل وكيل بسيط من خلال بيئة GSM8K، وعرض الوظائف الأساسية.
بناء بيئة جديدة: إضافة بيئة Aviary HotPotQA إلى NeMo Gym، وعرض قابلية التوسع.

أفضل الممارسات لبناء وكلاء علميين

يقدم المؤلفون نصائح قيمة لبناء وكلاء علميين فعالين:

ابدأ بسيطًا: ابدأ بوكيل أساسي وقم بزيادة التعقيد تدريجيًا.
توصيف المكافأة: قياس إحصائيات المكافأة لإنشاء بيئة تدريب فعالة.
مراقبة مقاييس التدريب: تتبع المقاييس لتحديد المشكلات مثل مشاكل أخذ العينات أو انهيار النموذج.
التدريب لفترة أطول: قد تتطلب طرق RLVR فترات تدريب طويلة لتحقيق تعلم كبير.

في الختام، يوفر إطارا عمل NeMo Gym وNeMo RL، جنبًا إلى جنب مع مبادرات مثل Aviary الخاصة بـ Edison Scientific، نظامًا أساسيًا قويًا وقابلاً للتطوير لتطوير وكلاء الذكاء الاصطناعي القادرين على أتمتة الاكتشاف العلمي. من خلال اتباع أفضل الممارسات والاستفادة من هذه الأدوات، يمكن للباحثين إطلاق إمكانات جديدة وتسريع التقدم العلمي.

المصدر: DEVELOPER

In the realm of scientific research, the tedious and repetitive nature of tasks such as literature review, experiment management, and data wrangling can significantly hinder progress. This blog post dives into how NVIDIA’s NeMo Gym and NeMo RL frameworks, coupled with the work of Edison Scientific and their Aviary environment, are revolutionizing the field by enabling the creation and training of AI-powered scientific agents. These agents can automate many of the time-consuming tasks, freeing up researchers to focus on creative problem-solving and scientific discovery.

Reinforcement Learning: Extending LLM Capabilities for Science

Traditional Language Learning Models (LLMs) excel at predicting the next token, leading to broad knowledge but often lacking in specific domain skills. Reinforcement Learning (RL) bridges this gap by allowing models to reason and act beyond supervised data. The post highlights the following key aspects:

Pre-training: Provides a foundational understanding but lacks domain-specific expertise.
Supervised Fine-Tuning (SFT): Learns from instruction-response pairs but is limited by dataset coverage and only rewards reproducing the reference answer.
Reinforcement Learning (RL): Uses a reward function to score outputs, enabling models to optimize for specific goals.
- RLHF (Reinforcement Learning from Human Feedback): Relies on human preferences for reward signals.
- RLAIF (Reinforcement Learning from AI Feedback): Uses LLMs as judges.
- RLVR (Reinforcement Learning with Verifiable Rewards): Uses computational checks for objective reward signals, crucial for scientific tasks.
Scientific RL: Enables agents to design and run experiments, evaluate outcomes, and optimize toward scientific metrics.

NeMo Gym and NeMo RL: Improving Agentic Training

The integration of NeMo Gym and NeMo RL offers a powerful framework for building and evaluating LLM agents for scientific research.

NeMo RL: Provides training algorithms, manages compute resources, and orchestrates model updates. The latest release supports features like on-policy distillation, asyncRL, advanced RL algorithms, and end-to-end FP8 RL training.
NeMo Gym: An open-source framework for building RL training environments at scale, handling diverse dependencies and domain-specific requirements.
- Offers three core server abstractions:
  - Model: Wraps OpenAI-compatible endpoints for reasoning and tool-calling support.
  - Resources: Provides tool implementations and verification logic.
  - Agents: Orchestrates interactions between models and resources.
Key Functionality: NeMo Gym generates rollouts and rewards, allowing for scalable training and integration with existing systems.

Edison Scientific and Aviary: A Practical Example

Edison Scientific utilizes NeMo Gym and NeMo RL with their Aviary framework to automate scientific discovery across biology, chemistry, and related fields. Aviary provides RL training environments for various tasks, including literature research, bioinformatics data analysis, and molecular cloning.

Aviary’s Core Methods:
- reset(): Initializes the environment and returns the first observation.
- step(): Executes an action and returns new observations, rewards, and termination signals.
Example Use Case: Edison Scientific trains a Jupyter-notebook data-analysis agent for bioinformatics tasks. They’ve implemented context management features to handle large notebooks, including interaction history dropping and GRPO grouping on individual steps.
BixBench: A benchmark of verifiable bioinformatics questions built by Edison Scientific to test and validate their system.

Building Agentic Environments with NeMo Gym

The blog post provides a step-by-step guide to building agentic environments in NeMo Gym. The key steps include:

Installing NeMo Gym: Cloning the repository and setting up a virtual environment.
Configuring the Model: Using a hosted model or deploying one locally with vLLM, enabling tool-calling.
Testing an Aviary Environment: Running a simple agent through the GSM8K environment, demonstrating basic functionality.
Building a New Environment: Adding the Aviary HotPotQA environment to NeMo Gym, showcasing extensibility.

Best Practices for Building Scientific Agents

The authors offer valuable advice for building effective scientific agents:

Start Simple: Begin with a basic agent and incrementally add complexity.
Reward Profiling: Measure reward statistics to create an efficient training environment.
Monitor Training Metrics: Track metrics to identify issues such as sampling problems or model collapse.
Train Longer: RLVR-based methods may require extended training periods to achieve significant learning.

In conclusion, the NeMo Gym and NeMo RL frameworks, combined with initiatives like Edison Scientific’s Aviary, provide a powerful and scalable platform for developing AI agents capable of automating scientific discovery. By following best practices and leveraging these tools, researchers can unlock new possibilities and accelerate scientific progress.

Source: DEVELOPER