نشر نماذج اللغات الصغيرة على الحافة باستخدام AWS IoT Greengrass ووكلاء Strands

تسعى الشركات المصنعة الحديثة ومختلف الصناعات الأخرى بشكل متزايد إلى إيجاد طرق للاستفادة من الحلول المدعومة بالذكاء الاصطناعي لمعالجة البيانات في الوقت الفعلي واتخاذ القرارات الذكية على الحافة. توضح هذه المدونة كيفية نشر نماذج اللغات الصغيرة (SLMs) على نطاق واسع باستخدام AWS IoT Greengrass ووكلاء Strands، مما يتيح رؤى واعية بالسياق مباشرة على أجهزة الحافة مع الحفاظ على الأمان والأداء. على عكس نظيراتها الأكبر حجمًا، تم تصميم SLMs لتناسب القيود المفروضة على الموارد في البيئات الصناعية، مما يجعلها مثالية للسيناريوهات التي يكون فيها الموثوقية وزمن الاستجابة المنخفض أمرًا بالغ الأهمية.

الفوائد وحالات الاستخدام الرئيسية

رؤى في الوقت الفعلي: تمكن المشغلين من الاستعلام عن حالة المعدات وتفسير بيانات القياس عن بعد والوصول إلى الوثائق على الفور دون اتصال سحابي.
النهج المختلط: يجمع بين الاستدلال المحلي للاستجابات الفورية مع الموارد السحابية للتحليلات المعقدة وتحسين المواقع المتعددة وإعادة تدريب النموذج.
تطبيقات متعددة الاستخدامات: قابلة للتطبيق في مختلف الصناعات، بما في ذلك السيارات (الأوامر الصوتية)، والطاقة (معالجة بيانات SCADA)، والألعاب (الذكاء الاصطناعي المصاحب)، والتعليم العالي (التعلم المخصص).

نظرة عامة على الحل

تستفيد البنية من AWS IoT Greengrass لنشر وإدارة SLMs على أجهزة الحافة، ويكملها وكلاء Strands لقدرات الوكيل المحلي. تشمل خدمات AWS الرئيسية المستخدمة:

AWS IoT Greengrass: ينشر ويدير ويراقب برامج الأجهزة على الحافة.
AWS IoT Core: يربط أجهزة إنترنت الأشياء بشكل آمن بسحابة AWS.
Amazon S3: يخزن ويسترجع ملفات نماذج SLM بتنسيق GGUF.
وكلاء Strands: يوفر إطار عمل Python لتشغيل أنظمة متعددة الوكلاء، وتنظيم الاستدلال المحلي والسحابي.

يستخدم الحل محاكي OPC-UA لتمثيل بيئة المصنع مع فرن وحزام ناقل، بالإضافة إلى كتيبات الصيانة للبيانات الصناعية. يتقدم سير العمل على النحو التالي:

يتم تحميل ملف النموذج بتنسيق GGUF إلى حاوية Amazon S3 يمكن الوصول إليها بواسطة أجهزة AWS IoT Greengrass.
تتلقى الأجهزة مهمة تنزيل الملفات التي يتم التعامل معها بواسطة مكون S3FileDownloader، والذي يدير عمليات نقل الملفات الكبيرة بكفاءة.
يتم تحميل النموذج في Ollama، وهو برنامج يقوم بتشغيل الاستدلال على ملفات نماذج GGUF، عند استدعائه لأول مرة بواسطة مكون وكلاء Strands.
يتم إرسال استعلام المستخدم إلى الوكيل المحلي عبر موضوع MQTT خاص بالجهاز في AWS IoT Core.
يستخدم Strands Agents SDK وكيل Orchestrator الخاص به لفهم الاستعلام وتحديد مصادر المعلومات ذات الصلة والتفويض إلى وكلاء متخصصين مثل وكيل التوثيق ووكيل OPC-UA.
يسترجع وكيل التوثيق المعلومات من المستندات، ويستعلم وكيل OPC-UA عن خادم OPC-UA للحصول على بيانات الجهاز في الوقت الفعلي أو التاريخية.
يقوم وكيل Orchestrator بتجميع المعلومات ونشر استجابة إلى موضوع استجابة خاص بالجهاز.

Strands Agents SDK: يسهل التفاعل مع النماذج المنشورة محليًا (Ollama) ويوفر المرونة للتبديل إلى النماذج المستندة إلى السحابة مثل تلك الموجودة في Amazon Bedrock عند توفرها.
دور IAM وشهادة إنترنت الأشياء: تمكن الوصول الآمن إلى موارد S3 واتصالات MQTT مع AWS IoT Core.
تسجيل الدخول: يتم تسجيل تشغيل المكون محليًا، مع تكامل اختياري مع AWS CloudWatch للمراقبة المستندة إلى السحابة.

تجول في النشر

توفر المدونة تجولًا تفصيليًا بما في ذلك الخطوات التالية:

استنساخ مستودع وكلاء Strands من GitHub.
استخدم Greengrass Development Kit (GDK) CLI لإنشاء ونشر المكون. يعد تكوين ملف gdk-config.json (المنطقة والحاوية) مطلبًا.
نشر المكون على جهاز الحافة عبر وحدة تحكم AWS IoT Greengrass.

للاختبار في مثيل Amazon EC2، يتم توفير قالب CloudFormation لنشر مثيل GPU مع البرامج المثبتة مسبقًا وموارد Greengrass الضرورية.

إدارة ملفات النموذج

يعد تنزيل ملف SLM خطوة حاسمة:

يجب وضع ملف النموذج بتنسيق GGUF ضمن /tmp/destination/ على جهاز الحافة وتسميته model.gguf.
بدلاً من ذلك، يمكن استخدام مكون S3FileDownloader لتنزيل ملف النموذج من حاوية S3، مما يوفر إمكانات إعادة المحاولة والاستئناف التلقائي لسيناريوهات الاتصال غير الموثوق بها. يتم إعطاء حمولة نموذجية لمكون S3FileDownloader لتشغيل التنزيل.

الاختبار والمراقبة

يتضمن اختبار الوكيل المنشور الاشتراك في موضوع MQTT خاص بالجهاز ونشر استعلام. تتضمن المدونة مثالاً للاستعلام للتحقق من حالة الحزام الناقل وتوضح تنسيق الاستجابة المتوقع، بما في ذلك الاستجابات النهائية والفرعية للوكيل. يتم تحقيق مراقبة تشغيل المكون عن طريق التحقق من سجلات المكون الموجودة في /greengrass/v2/logs/com.strands.agent.greengrass.log.

خاتمة

أظهرت هذه المشاركة كيفية نشر SLM محليًا باستخدام AWS IoT Greengrass، وعرضت إمكانات الذكاء الاصطناعي للحافة في التصنيع والقطاعات الأخرى. يتيح تكامل SLMs من خلال وكلاء Strands على الأجهزة ذات الموارد المحدودة اتخاذ القرارات في الوقت الفعلي وتحسين الكفاءة. تختتم المشاركة بتصور نظام ذكاء اصطناعي مختلط بين السحابة والحافة حيث تتعامل وكلاء الحافة مع المعالجة في الوقت الفعلي ويدير وكلاء السحابة التفكير المعقد، مما يوفر حلاً قابلاً للتطوير وقابلاً للتكيف للأتمتة الذكية.

المصدر: AWS - Amazon Web Services

Modern manufacturers and various other industries are increasingly seeking ways to leverage AI-powered solutions for real-time data processing and intelligent decision-making at the edge. This blog post details how to deploy Small Language Models (SLMs) at scale using AWS IoT Greengrass and Strands Agents, enabling context-aware insights directly on edge devices while maintaining security and performance. Unlike their larger counterparts, SLMs are designed to fit within the resource constraints of industrial environments, making them ideal for scenarios where reliability and low latency are critical.

Key Benefits and Use Cases

Real-time Insights: Enables operators to query equipment status, interpret telemetry, and access documentation instantly without cloud connectivity.
Hybrid Approach: Combines local inference for immediate responses with cloud resources for complex analytics, multi-site optimization, and model retraining.
Versatile Applications: Applicable across industries, including automotive (voice commands), energy (SCADA data processing), gaming (companion AI), and higher education (personalized learning).

Solution Overview

The architecture leverages AWS IoT Greengrass to deploy and manage SLMs on edge devices, complemented by Strands Agents for local agent capabilities. Key AWS services involved include:

AWS IoT Greengrass: Deploys, manages, and monitors device software at the edge.
AWS IoT Core: Connects IoT devices securely to the AWS cloud.
Amazon S3: Stores and retrieves SLM model files in GGUF format.
Strands Agents: Provides a Python framework for running multi-agent systems, orchestrating local and cloud-based inference.

The solution utilizes an OPC-UA simulator to represent a factory environment with an oven and conveyor belt, along with maintenance runbooks for industrial data. The workflow proceeds as follows:

A model file in GGUF format is uploaded to an Amazon S3 bucket accessible to AWS IoT Greengrass devices.
Devices receive a file download job handled by the S3FileDownloader component, which efficiently manages large file transfers.
The model is loaded into Ollama, a software that runs inference on GGUF model files, when first called by Strands Agents component.
A user query is sent to the local agent via a device-specific MQTT topic in AWS IoT Core.
The Strands Agents SDK uses its Orchestrator Agent to understand the query, determine relevant information sources, and delegate to specialized agents like Documentation Agent and OPC-UA Agent.
Documentation Agent retrieves information from documents, and OPC-UA Agent queries the OPC-UA server for real-time or historical machine data.
The Orchestrator Agent aggregates the information and publishes a response to a device-specific response topic.

Strands Agents SDK: Facilitates interaction with locally deployed models (Ollama) and offers the flexibility to switch to cloud-based models like those in Amazon Bedrock when available.
IAM Role and IoT Certificate: Enables secure access to S3 resources and MQTT communication with AWS IoT Core.
Logging: Component operation is logged locally, with optional integration with AWS CloudWatch for cloud-based monitoring.

Deployment Walkthrough

The blog post provides a detailed walkthrough including the following steps:

Clone the Strands Agents repository from GitHub.
Use the Greengrass Development Kit (GDK) CLI to build and publish the component. Configuration of the gdk-config.json file (region and bucket) is a requirement.
Deploy the component to the edge device via the AWS IoT Greengrass console.

For testing in an Amazon EC2 instance, a CloudFormation template is provided to deploy a GPU instance with pre-installed software and necessary Greengrass resources.

Model File Management

Downloading the SLM file is a critical step:

The model file, in GGUF format, must be placed under /tmp/destination/ on the edge device and named model.gguf.
Alternatively, the S3FileDownloader component can be used to download the model file from an S3 bucket, offering automatic retry and resume capabilities for unreliable connectivity scenarios. A sample payload for the S3FileDownloader component is given to trigger the download.

Testing and Monitoring

Testing the deployed agent involves subscribing to a device-specific MQTT topic and publishing a query. The blog post includes an example query to check the status of the conveyor belt and demonstrates the expected response format, including final and sub-agent responses. Monitoring the component’s operation is achieved by checking the component logs located at /greengrass/v2/logs/com.strands.agent.greengrass.log.

Conclusion

This post demonstrated how to deploy an SLM locally with AWS IoT Greengrass, showcasing the potential of edge AI in manufacturing and other sectors. The integration of SLMs through Strands Agents on constrained hardware allows for real-time decision-making and improved efficiency. The post concludes by envisioning a hybrid cloud-edge AI system where edge agents handle real-time processing and cloud agents manage complex reasoning, offering a scalable and adaptable solution for intelligent automation.

Source: AWS - Amazon Web Services