A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm
vLLM is a fast and easy-to-use library for LLM inference and serving.
vLLM is fast with:
vLLM is flexible and easy to use with:
vLLM seamlessly supports many Hugging Face models, including the following architectures:
BAAI/AquilaChat2-7B
, BAAI/AquilaChat2-34B
, BAAI/Aquila-7B
, BAAI/AquilaChat-7B
, etc.)baichuan-inc/Baichuan2-13B-Chat
, baichuan-inc/Baichuan-7B
, etc.)bigscience/bloom
, bigscience/bloomz
, etc.)THUDM/chatglm2-6b
, THUDM/chatglm3-6b
, etc.)CohereForAI/c4ai-command-r-v01
, etc.)databricks/dbrx-base
, databricks/dbrx-instruct
etc.)Deci/DeciLM-7B
, Deci/DeciLM-7B-instruct
, etc.)tiiuae/falcon-7b
, tiiuae/falcon-40b
, tiiuae/falcon-rw-7b
, etc.)google/gemma-2b
, google/gemma-7b
, etc.)gpt2
, gpt2-xl
, etc.)bigcode/starcoder
, bigcode/gpt_bigcode-santacoder
, etc.)EleutherAI/gpt-j-6b
, nomic-ai/gpt4all-j
, etc.)EleutherAI/gpt-neox-20b
, databricks/dolly-v2-12b
, stabilityai/stablelm-tuned-alpha-7b
, etc.)internlm/internlm-7b
, internlm/internlm-chat-7b
, etc.)internlm/internlm2-7b
, internlm/internlm2-chat-7b
, etc.)core42/jais-13b
, core42/jais-13b-chat
, core42/jais-30b-v3
, core42/jais-30b-chat-v3
, etc.)meta-llama/Llama-2-70b-hf
, lmsys/vicuna-13b-v1.3
, young-geng/koala
, openlm-research/open_llama_13b
, etc.)openbmb/MiniCPM-2B-sft-bf16
, openbmb/MiniCPM-2B-dpo-bf16
, etc.)mistralai/Mistral-7B-v0.1
, mistralai/Mistral-7B-Instruct-v0.1
, etc.)mistralai/Mixtral-8x7B-v0.1
, mistralai/Mixtral-8x7B-Instruct-v0.1
, etc.)mosaicml/mpt-7b
, mosaicml/mpt-30b
, etc.)allenai/OLMo-1B
, allenai/OLMo-7B
, etc.)facebook/opt-66b
, facebook/opt-iml-max-30b
, etc.)OrionStarAI/Orion-14B-Base
, OrionStarAI/Orion-14B-Chat
, etc.)microsoft/phi-1_5
, microsoft/phi-2
, etc.)Qwen/Qwen-7B
, Qwen/Qwen-7B-Chat
, etc.)Qwen/Qwen1.5-7B
, Qwen/Qwen1.5-7B-Chat
, etc.)Qwen/Qwen1.5-MoE-A2.7B
, Qwen/Qwen1.5-MoE-A2.7B-Chat
, etc.)stabilityai/stablelm-3b-4e1t
, stabilityai/stablelm-base-alpha-7b-v2
, etc.)bigcode/starcoder2-3b
, bigcode/starcoder2-7b
, bigcode/starcoder2-15b
, etc.)xverse/XVERSE-7B-Chat
, xverse/XVERSE-13B-Chat
, xverse/XVERSE-65B-Chat
, etc.)01-ai/Yi-6B
, 01-ai/Yi-34B
, etc.)Install vLLM with pip or from source:
pip install vllm