Bendi新闻
>
仅用250美元,Hugging Face技术主管手把手教你微调Llama 3
仅用250美元,Hugging Face技术主管手把手教你微调Llama 3
8月前
大语言模型的微调一直是说起来容易做起来难的事儿。近日 Hugging Face 技术主管 Philipp Schmid 发表了一篇博客,详细讲解了如何利用 Hugging Face 上的库和 fsdp 以及 Q-Lora 对大模型进行微调。
设置开发环境 创建并加载数据集 使用 PyTorch FSDP、Q-Lora 和 SDPA 微调大语言模型 测试模型并进行推理
# Install Pytorch for FSDP and FA/SDPA
%pip install "torch==2.2.2" tensorboard
# Install Hugging Face libraries
%pip install --upgrade "transformers==4.40.0" "datasets==2.18.0" "accelerate==0.29.3" "evaluate==0.4.1" "bitsandbytes==0.43.1" "huggingface_hub==0.22.2" "trl==0.8.6" "peft==0.10.0"
{"messages": [{"role": "system", "content": "You are..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
{"messages": [{"role": "system", "content": "You are..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
{"messages": [{"role": "system", "content": "You are..."}, {"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
from datasets import load_dataset
# Convert dataset to OAI messages
system_message = """You are Llama, an AI assistant created by Philipp to be helpful and honest. Your knowledge spans a wide range of topics, allowing you to engage in substantive conversations and provide analysis on complex subjects."""
def create_conversation(sample):
if sample["messages"][0]["role"] == "system":
return sample
else:
sample["messages"] = [{"role": "system", "content": system_message}] + sample["messages"]
return sample
# Load dataset from the hub
dataset = load_dataset("HuggingFaceH4/no_robots")
# Add system message to each conversation
columns_to_remove = list(dataset["train"].features)
columns_to_remove.remove("messages")
dataset = dataset.map(create_conversation, remove_columns=columns_to_remove,batched=False)
# Filter out conversations which are corrupted with wrong turns, keep which have even number of turns after adding system message
dataset["train"] = dataset["train"].filter(lambda x: len(x["messages"][1:]) % 2 == 0)
dataset["test"] = dataset["test"].filter(lambda x: len(x["messages"][1:]) % 2 == 0)
# save datasets to disk
dataset["train"].to_json("train_dataset.json", orient="records", force_ascii=False)
dataset["test"].to_json("test_dataset.json", orient="records", force_ascii=False)
格式化的数据集,包括格式化的多轮会话和指令(已使用) 只对完整的内容进行训练,忽略只有 prompts 的情况(未使用) 打包数据集,提高训练效率(已使用) 支持参数高效微调技术,包括 Q-LoRA(已使用) 为会话级任务微调初始化模型和分词器(未使用,见下文)
%%writefile llama_3_70b_fsdp_qlora.yaml
# script parameters
model_id: "meta-llama/Meta-Llama-3-70b" # Hugging Face model id
dataset_path: "." # path to dataset
max_seq_len: 3072 # 2048 # max sequence length for model and packing of the dataset
# training parameters
output_dir: "./llama-3-70b-hf-no-robot" # Temporary output directory for model checkpoints
report_to: "tensorboard" # report metrics to tensorboard
learning_rate: 0.0002 # learning rate 2e-4
lr_scheduler_type: "constant" # learning rate scheduler
num_train_epochs: 3 # number of training epochs
per_device_train_batch_size: 1 # batch size per device during training
per_device_eval_batch_size: 1 # batch size for evaluation
gradient_accumulation_steps: 2 # number of steps before performing a backward/update pass
optim: adamw_torch # use torch adamw optimizer
logging_steps: 10 # log every 10 steps
save_strategy: epoch # save checkpoint every epoch
evaluation_strategy: epoch # evaluate every epoch
max_grad_norm: 0.3 # max gradient norm
warmup_ratio: 0.03 # warmup ratio
bf16: true # use bfloat16 precision
tf32: true # use tf32 precision
gradient_checkpointing: true # use gradient checkpointing to save memory
# FSDP parameters: https://huggingface.co/docs/transformers/main/en/fsdp
fsdp: "full_shard auto_wrap offload" # remove offload if enough GPU memory
fsdp_config:
backward_prefetch: "backward_pre"
forward_prefetch: "false"
use_orig_params: "false"
!ACCELERATE_USE_FSDP=1 FSDP_CPU_RAM_EFFICIENT_LOADING=1 torchrun --nproc_per_node=4 ./scripts/run_fsdp_qlora.py --config llama_3_70b_fsdp_qlora.yaml
使用 FSDP 进行全微调需要约 16 块 80GB 内存的 GPU FSDP+LoRA 需要约 8 块 80GB 内存的 GPU FSDP+Q-Lora 需要约 2 块 40GB 内存的 GPU FSDP+Q-Lora+CPU offloading 技术需要 4 块 24GB 内存的 GPU,以及一块具备 22 GB 内存的 GPU 和 127 GB 的 CPU RAM,序列长度为 3072、batch 大小为 1。
### COMMENT IN TO MERGE PEFT AND BASE MODEL ####
# from peft import AutoPeftModelForCausalLM
# # Load PEFT model on CPU
# model = AutoPeftModelForCausalLM.from_pretrained(
# args.output_dir,
# torch_dtype=torch.float16,
# low_cpu_mem_usage=True,
# )
# # Merge LoRA and base model and save
# merged_model = model.merge_and_unload()
# merged_model.save_pretrained(args.output_dir,safe_serialization=True, max_shard_size="2GB")
import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
peft_model_id = "./llama-3-70b-hf-no-robot"
# Load Model with PEFT adapter
model = AutoPeftModelForCausalLM.from_pretrained(
peft_model_id,
torch_dtype=torch.float16,
quantization_config= {"load_in_4bit": True},
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
from datasets import load_dataset
from random import randint
# Load our test dataset
eval_dataset = load_dataset("json", data_files="test_dataset.json", split="train")
rand_idx = randint(0, len(eval_dataset))
messages = eval_dataset[rand_idx]["messages"][:2]
# Test on sample
input_ids = tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt").to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=512,
eos_token_id= tokenizer.eos_token_id,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(f"**Query:**\n{eval_dataset[rand_idx]['messages'][1]['content']}\n")
print(f"**Original Answer:**\n{eval_dataset[rand_idx]['messages'][2]['content']}\n")
print(f"**Generated Answer:**\n{tokenizer.decode(response,skip_special_tokens=True)}")
# **Query:**
# How long was the Revolutionary War?
# **Original Answer:**
# The American Revolutionary War lasted just over seven years. The war started on April 19, 1775, and ended on September 3, 1783.
# **Generated Answer:**
# The Revolutionary War, also known as the American Revolution, was an 18th-century war fought between the Kingdom of Great Britain and the Thirteen Colonies. The war lasted from 1775 to 1783.
© THE END
转载请联系本公众号获得授权
投稿或寻求报道:[email protected]
微信扫码关注该文公众号作者
来源:机器之心
相关新闻
英伟达最新技术分享:手把手教你用Llama 3.1合成数据改进模型!附代码手把手教你用 Lemfi 国际汇款,注册奖励 $30,无任何手续费!支持支付宝、微信实时到账Karpathy点赞,这份报告教你如何用 LLaMa 3创建高质量网络数据集24GB单卡全量微调Llama 3-8B,仅需添加一行代码全网首发!Llama 3技术剖析、微调、部署以及多模态训练【讲座】Llama 3技术剖析、微调、部署以及多模态训练伊州人到手的钱更多!全美平均3,213美元,你收到了吗?更多钱到手!平均$3145美元!4300万份已发出,你收到了吗?仅用 30 分钟!开发者做“山寨版” VSCode 扩展,攻破 4830 亿美元巨头,甚至登上了官方热趋榜?620000美元/股的伯克希尔185.1美元就买到手!纽交所上演“惊魂”事件,盘点美股史上3次类似技术性停牌及影响墨尔本房价太高,男子携家带眷搬到日本,仅用$3万购置房产,但修缮花了$25万RAG微调Llama 3竟超越GPT-4!英伟达GaTech华人学者提出RankRAG框架国产黑马砸来百万算力福利,Llama 3微调快去冲!H800点击就送,1.99元玩转409058同城孙启明:生活服务垂类大模型怎么搭?自研+开源两手抓,火速微调上线Llama 3|GenAICon2024教程来了!3分钟教你搭建:AI大模型前端界面尾盘突然暴拉超15%,仅用904手!加州高档公寓遭强拍!百万豪宅花3万美元就到手?百万豪宅花3万美元就到手?加州高档公寓遭强拍!洗钱250万美元!3华男获罪!背后牵出庞大诈骗集团!只要3万美元!你绝对想不到,这个巨大的快递里装的是一所房子性能对标Llama 3,算力消耗仅1/19!源2.0-M32大幅提升模算效率只需单卡RTX 3090,低比特量化训练就能实现LLaMA-3 8B全参微调别再说国产大模型技术突破要靠 Llama 3 开源了250行代码从头搭建Llama 3,GitHub一天4.6k星!Karpathy大赞