TacoSkill LABTacoSkill LAB

The full-lifecycle AI skills platform.

Product

  • SkillHub
  • Playground
  • Skill Create
  • SkillKit

Resources

  • Privacy
  • Terms
  • About

Platforms

  • Claude Code
  • Cursor
  • Codex CLI
  • Gemini CLI
  • OpenCode

© 2026 TacoSkill LAB. All rights reserved.

TacoSkill LAB
TacoSkill LAB
HomeSkillHubCreatePlaygroundSkillKit
  1. Home
  2. /
  3. SkillHub
  4. /
  5. nemo-evaluator-sdk
Improve

nemo-evaluator-sdk

7.6

by zechenzhangAGI

94Favorites
170Upvotes
0Downvotes

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible benchmarking.

evaluation

7.6

Rating

0

Installs

AI & LLM

Category

Quick Review

Excellent skill with comprehensive workflows for enterprise LLM evaluation. The description clearly covers multi-backend execution across 100+ benchmarks, and a CLI agent can confidently invoke this skill for evaluation tasks. Task knowledge is outstanding with 4 detailed workflows covering standard benchmarks, HPC deployment, model comparison, and safety/VLM evaluation—complete with config examples, CLI commands, and Python API usage. Structure is very clean with a concise main document and references for advanced topics. Novelty is high: orchestrating containerized evaluation across multiple backends (Docker/Slurm/cloud) with 18+ harnesses would require significant tokens and expertise for a CLI agent to accomplish independently. Minor improvement possible: could slightly expand the description to mention safety/VLM capabilities explicitly for better discoverability.

LLM Signals

Description coverage9
Task knowledge10
Structure9
Novelty9

GitHub Signals

891
74
19
2
Last commit 0 days ago

Publisher

zechenzhangAGI

zechenzhangAGI

Skill Author

Related Skills

prompt-engineermcp-developerrag-architect

Loading SKILL.md…

Try onlineView on GitHub

Publisher

zechenzhangAGI avatar
zechenzhangAGI

Skill Author

Related Skills

prompt-engineer

Jeffallan

7.0

mcp-developer

Jeffallan

6.4

rag-architect

Jeffallan

7.0

fine-tuning-expert

Jeffallan

6.4
Try online