准备conda虚拟环境，用来隔离各种测试环境。

准备pytorch

https://pytorch.org/get-started/locally/

使用conda安装：这里的12.1和上面的12.3是否兼容？

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

准备CUDA环境

CUDA全称（Compute Unified Device Architecture）统一计算架构，是NVIDIA推出的并行计算平台，它提供了相关API让开发者可以使用GPU完成通用计算加速（GPGPU），而不仅仅是图形计算。

官方下载地址：官方下载地址我们下载的安装包：cuda_12.3.2_546.12_windows.exe

准备代码和模型

从 GitHub 下载 ChatGLM3-6B 仓库。

git clone https://github.com/THUDM/ChatGLM3 
cd ChatGLM3
 
# 在项目目录中，安装模型所需的依赖项。
pip install -r requirements.txt
 
# 下载模型文件，新建目录THUDM，并使用 Modelscope下载模型文件到此文件夹中，根据github说明文档，huggingface首发，但是国内网络不友好，使用Modelscope来下载比较快。
mkdir THUDM
cd THUDM
git lfs install
git clone https://www.modelscope.cn/ZhipuAI/chatglm3-6b.git

安装 flash-attn

https://github.com/Dao-AILab/flash-attention

过程中需要使用git下载文件：Git-2.43.0-64-bit.exe https://git-scm.com/download/win 编译过程需要：Microsoft Visual C++ 14.0 or greater is required. Get it with “Microsoft C++ Build Tools”: https://visualstudio.microsoft.com/visual-cpp-build-tools/

pip install ninja
ninja -version
 
git clone https://github.com/Dao-AILab/flash-attention.git
python .\setup.py install

Command ’[‘ninja’, ‘-v’]’ returned non-zero exit status 1 修改环境下：site-packages/torch/utils/cpp_extension.py 文件，将其中的查看ninja版本的命令改成ninja -v 改成 —V或者-version。

! 上面安装失败了，先使用其他人编译好的。 https://github.com/bdashore3/flash-attention/releases

这里我们选择：flash_attn-2.5.2+cu122torch2.2.0cxx11abiFALSE-cp311-cp311-win_amd64.whl

pip install flash_attn-2.5.2+cu122torch2.2.0cxx11abiFALSE-cp311-cp311-win_amd64.whl

验证环境

import torch  
 
print("TORCH版本", torch.__version__)  
print("查看GPU是否可用", torch.cuda.is_available())  
print("查看GPU数量", torch.cuda.device_count())  
print("查看GPU索引号", torch.cuda.current_device())  
print("查看GPU名称，根据索引号查询", torch.cuda.get_device_name(0))  
print("TORCH方法查看CUDA版本", torch.version.cuda)

# 修改demo，使用量化的方式运行
pip install streamlit
streamlit run web_demo_streamlit.py
 
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(4).cuda()

加速方案： https://github.com/THUDM/ChatGLM3/blob/main/tensorrt_llm_demo/README.md https://www.bilibili.com/read/cv28888208/

关于CUDA https://zhuanlan.zhihu.com/p/635515308?utm_id=0

DX

Explorer

ChatGLM3-6B-int4

准备pytorch

准备CUDA环境

准备代码和模型

安装 flash-attn

验证环境

Graph View

Table of Contents