Bendi新闻
>
DeepMind研究成本大起底,一篇ICML论文烧掉1290万美元
DeepMind研究成本大起底,一篇ICML论文烧掉1290万美元
3月前
新智元报道
新智元报道
【新智元导读】DeepMind最近被ICML 2024接收的一篇论文,完完全全暴露了他们背靠谷歌的「豪横」。一篇文章预估了这项研究所需的算力和成本,大概是Llama 3预训练的15%,耗费资金可达12.9M美元。
论文地址:https://arxiv.org/abs/2407.05872
Transformer架构信息
def M(d: int, L=8, l_seq=512, V=32101) -> int:
return 6*d * (L*(12*d + l_seq) + V)
TPE = 50000 * 256 * 512
对齐实验
def alignment() -> int:
return 4 * TPE * sum(M(d) for d in [1024,2048,4096])
# >>> f'{alignment():.3E}'
# '3.733E+20'
# >>> cost_of_run(alignment())[0]
# 888.81395400704
学习率
子问题:最佳评估损失(eval loss)实验
H = [1,2,4,6,8,12,16,20,24,32,48,64,96,128]
D = [h * 128 for h in H]
def table_e1() -> int:
sets_x_optims = 5 + 7 + 7
return 4 * sets_x_optims * TPE * sum(M(d) for d in D[-6:])
# >>> f'{table_e1():.3E}';cost_of_run(table_e1())
# '1.634E+23'
# (388955.9991064986, 16206.499962770775)
β参数
def beta_only() -> int:
return 3*4*2*PpL * TPE * sum(M(d) for d in D)
# 7.988E+23 (1902022.3291813303, 79250.93038255542)
γ参数
def gamma_expts() -> int:
return 36*TPE * (800*M(1024) + PpL*sum(M(d) for d in D))
# gamma_expts 1.354E+24 (3224397.534237257, 134349.8972598857)
Adam优化器的Epsilon参数
PpL = 15 # unprincipled estimate
def eps_variants() -> int:
return 4 * 6 * PpL * TPE * sum(M(d) for d in D)
'''
>>> f'{eps_variants():.3E}';cost_of_run(eps_variants())
'7.988E+23'
(1902022.3291813303, 79250.93038255542)
'''
def eps_heatmaps() -> int:
# eps-type * eps-val * parameterizations * LR range * ...
return 2 * 6 * 4 * 13 * TPE * sum(M(d) for d in D[-6:])
'''
>>> f'{eps_heatmaps():.3E}';cost_of_run(eps_heatmaps())
'1.341E+24'
(3193533.466348094, 133063.89443117057)
'''
权重衰减
def weight_decay() -> int:
return 4 * PpL * TPE * sum(M(d) for d in D)
'''
>>> f'{weight_decay():.3E}'; cost_of_run(weight_decay())
'1.331E+23'
(317003.7215302217, 13208.488397092571)
'''
Adafactor优化器
def adafactor() -> int:
return 2*2*4*PpL*TPE*sum(M(d) for d in D[:11])
'''
>>> f'{adafactor():.3E}'; cost_of_run(adafactor())
'7.918E+22'
(188532.80765144504, 7855.533652143543)
'''
计算最优化
def P(d: int, L=8, V=32101) -> int:
return 2 * d * (6*L*d + V)
def compute_optimal():
indices_50k = (14, 14, 12)
return 4*PpL*sum([
TPE * sum(sum( M(d) for d in D[:i] ) for i in indices_50k),
20 * sum(P(d)*M(d) for d in D[:11]) *3,
])
# compute_optim 7.518E+23 (1790104.1799513847, 74587.67416464102)
总结
alignment 3.733E+20 (888.81395400704, 37.033914750293334)
table_e1 1.634E+23 (388955.9991064986, 16206.499962770775)
eps_variants 7.988E+23 (1902022.3291813303, 79250.93038255542)
eps_heatmaps 1.341E+24 (3193533.466348094, 133063.89443117057)
beta_only 7.988E+23 (1902022.3291813303, 79250.93038255542)
gamma_expts 1.354E+24 (3224397.534237257, 134349.8972598857)
weight_decay 1.331E+23 (317003.7215302217, 13208.488397092571)
adafactor 7.918E+22 (188532.80765144504, 7855.533652143543)
compute_optim 7.518E+23 (1790104.1799513847, 74587.67416464102)
total_flops=5.421E+24
rental price: US$12.9M
h100 node months required: 746.9595590938408
(sanity check) D=[128, 256, 512, 768, 1024, 1536, 2048, 2560, 3072, 4096, 6144, 8192, 12288, 16384]
(sanity check) model sizes: ['0.00979B', '0.0227B', '0.058B', '0.106B', '0.166B', '0.325B', '0.534B', '0.794B', '1.1B', '1.87B', '4.02B', '6.97B', '15.3B', '26.8B']
(sanity check) M/6P: ['63.4%', '68.5%', '75.3%', '79.7%', '82.8%', '86.8%', '89.3%', '91.0%', '92.2%', '93.9%', '95.7%', '96.7%', '97.7%', '98.3%']
参考资料:
https://152334h.github.io/blog/scaling-exponents/
微信扫码关注该文公众号作者
来源:新智元
相关新闻
今日arXiv最热NLP大模型论文:伯克利&DeepMind联合研究,RaLMSpec让检索增强LLM速度提升2-7倍!库克:苹果将「开辟 AI 新天地」;比特币突破 6 万美元,市场兴奋;百度文心大模型推理成本骤降 99% | 极客早知道大模型多烧钱?明星独角兽揭秘训练成本:明年或达100亿美元特朗普明天来硅谷筹款,入场费5万美元起!拜登紧随其后来“圈钱”~“无证客快来”!这座美国城市给每人准备了$1.5万美元大礼包Allston新楼盘出售,2023年全新完工,68.9万美元起|近BU/BC/哈佛商学院、通勤火车站和I-90高速Longwood医学区附近新楼盘出售,88.9万美元起,每户配车库车位,近东北大学和地铁橙线Brighton新楼盘出售中,87.5万美元起|近哈佛商学院/BU/BC,方便上I-90高速和搭乘地铁绿线时尚老牌 Esprit 拟以 4750万美元出售大中华区商标权,瑞士和意大利业务面临破产美国奥运选手开箱价值1.3万美元的赞助大礼包,没想到最值钱的竟是...曝OpenAI今年亏麻了!烧掉85亿美元首次创业,9 个月内月入 10 万美元!YC 大佬分享致富秘籍:靠它开发速度翻 10 倍!一年烧掉4000亿美元,美股“七姐妹”的AI战争全美买房负担最重的都在加州!洛杉矶年收入近30万美元才能买得起房奥特曼称开发通用AI烧掉500亿美元也在所不惜;蔚来李斌赴美与黄仁勋探讨AI发展丨AIGC日报洛杉矶房价又涨了!独立屋平均价格接近96万美元,还买得起吗?获大厂2000万美元投资,这家要研发的“空间生存游戏”长啥样?起薪10万美元,SpaceX开高薪,导致NASA难招人才别等OpenAI了,全球首个类Sora抢先开源!所有训练细节/模型权重全公开,成本仅1万美元5000万美元“抄底”,马云首次成阿里最大股东圣地亚哥抓捕4名“珠宝大盗” 涉案21起金额超150万美元晚点财经丨吉姆·西蒙斯不做价值投资,收益率跑赢巴菲特;单车撑起极氪 2.8 万美元市值;学钢琴的孩子少了,雅马哈利润下滑Gemini Ultra训练烧掉近2亿美元!斯坦福迄今最全AI报告发布,中国AI专利数遥遥领先短剧出海难长久?成本最低压至8万美元引发恶性循环