TacoSkill LABTacoSkill LAB

The full-lifecycle AI skills platform.

DiscordFeedback

Product

  • SkillHub
  • Playground
  • Skill Create
  • SkillKit

Resources

  • Help Center
  • Privacy
  • Terms
  • About

Platforms

  • Claude Code
  • Cursor
  • Codex CLI
  • Gemini CLI
  • OpenCode

© 2026 TacoSkill LAB. All rights reserved.

TacoSkill LAB
TacoSkill LAB
HomeSkillHubCreatePlaygroundLeaderboardSkillKit
  1. Home
  2. /
  3. SkillHub
  4. /
  5. nemo-curator
Improve

nemo-curator

8.1

by davila7

0Views
179Favorites
291Upvotes
0Downvotes

GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII redaction, NSFW detection. Scales across GPUs with RAPIDS. Use for preparing high-quality training datasets, cleaning web data, or deduplicating large corpora.

data curation

8.1

Rating

0

Installs

AI & LLM

Category

Quick Review

Excellent skill documentation for GPU-accelerated LLM data curation. The description clearly communicates when to use NeMo Curator versus alternatives, and the SKILL.md provides comprehensive code examples covering all major operations (filtering, deduplication, PII redaction, multi-modal processing). Task knowledge is strong with concrete pipelines, performance benchmarks, and real-world cost comparisons (89% savings demonstrated). Structure is good with logical progression from basics to advanced patterns, though the single file is somewhat lengthy; some content could be modularized. Novelty is solid—this addresses a computationally expensive problem (data curation at TB scale) where GPU acceleration provides 10-16× speedups, meaningfully reducing both time and cost compared to CPU approaches or manual CLI operations. Minor improvements could include more modular organization and clearer separation of quickstart vs advanced patterns.

LLM Signals

Description coverage9
Task knowledge9
Structure8
Novelty7

GitHub Signals

18,073
1,635
132
71
Last commit 0 days ago

Publisher

davila7 logo
davila7

Skill Author

Related Skills

prompt-engineermcp-developerrag-architect

Loading SKILL.md…

Try onlineView on GitHub

Publisher

davila7 logo
davila7

Skill Author

Related Skills

prompt-engineer

Jeffallan

7.0

mcp-developer

Jeffallan

6.4

rag-architect

Jeffallan

7.0

fine-tuning-expert

Jeffallan

6.4
Try online