TacoSkill LAB

The full-lifecycle AI skills platform.

Product

SkillHub
Playground
Skill Create
SkillKit

Resources

Privacy
Terms
About

Platforms

Claude Code
Cursor
Codex CLI
Gemini CLI
OpenCode

© 2026 TacoSkill LAB. All rights reserved.

Home SkillHub Create Playground SkillKit

Home
/
SkillHub
/
serving-llms-vllm

serving-llms-vllm

2.2

by zechenzhangAGI

130Favorites

148Upvotes

0Downvotes

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.

vllm

2.2

Rating

0

Installs

AI & LLM

Category

Quick Review

No summary available.

LLM Signals

Description coverage-

Task knowledge-

Structure-

Novelty-

GitHub Signals

957

83

19

2

Last commit 2 days ago

Publisher

zechenzhangAGI

zechenzhangAGI

Skill Author

Related Skills

prompt-engineer mcp-developer rag-architect

Loading SKILL.md…

Try onlineView on GitHub

Publisher

Skill Author

Related Skills

prompt-engineer

Jeffallan

mcp-developer

Jeffallan

rag-architect

Jeffallan

fine-tuning-expert

Jeffallan