TacoSkill LABTacoSkill LAB

The full-lifecycle AI skills platform.

Product

  • SkillHub
  • Playground
  • Skill Create
  • SkillKit

Resources

  • Privacy
  • Terms
  • About

Platforms

  • Claude Code
  • Cursor
  • Codex CLI
  • Gemini CLI
  • OpenCode

© 2026 TacoSkill LAB. All rights reserved.

TacoSkill LAB
TacoSkill LAB
HomeSkillHubCreatePlaygroundSkillKit
  1. Home
  2. /
  3. SkillHub
  4. /
  5. serving-llms-vllm
Improve

serving-llms-vllm

2.2

by zechenzhangAGI

130Favorites
148Upvotes
0Downvotes

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.

vllm

2.2

Rating

0

Installs

AI & LLM

Category

Quick Review

No summary available.

LLM Signals

Description coverage-
Task knowledge-
Structure-
Novelty-

GitHub Signals

957
83
19
2
Last commit 2 days ago

Publisher

zechenzhangAGI

zechenzhangAGI

Skill Author

Related Skills

prompt-engineermcp-developerrag-architect

Loading SKILL.md…

Try onlineView on GitHub

Publisher

zechenzhangAGI avatar
zechenzhangAGI

Skill Author

Related Skills

prompt-engineer

Jeffallan

7.0

mcp-developer

Jeffallan

6.4

rag-architect

Jeffallan

7.0

fine-tuning-expert

Jeffallan

6.4
Try online