Arid
Topology-Aware Job Scheduling for Machine Learning Cluster
Lu, Jingyuan; Li, Peng; Wang, Kun; Feng, Huibin; Guo, Enting; Wang, Xiaoyan; Guo, Song
通讯作者Lu, JY (corresponding author), Nanjing Univ Posts & Telecommun, Nanjing, Peoples R China.
会议名称IEEE Global Communications Conference (GLOBECOM)
会议日期DEC 09-13, 2019
会议地点Waikoloa, HI
英文摘要Parameter Server (PS) has been widely used to train a large amount of data on multiple machines in parallel. In parameter server, a critical problem is how to effectively schedule multiple training jobs to minimize the job completion time. Some existing work has proposed methods of setting the number of concurrent workers. However, they do not effectively consider the topology of GPU placement which affects the efficiency of communication. This paper proposes a novel resource-to-time model based on the number of workers and the topology of GPU placement. According to the model, we propose an algorithm called TOPO-PS particularly for topology problem in parameter servers. The algorithm achieves the placement strategy based on graph mapping algorithm. Evaluation under various algorithms evidences the superiority of our algorithm. TOPO-PS yields shorter job completion, by up to 53.48% of that of FIFO and 88.77% of OASIS.
英文关键词Parameter Server Scheduling Algorithms Cloud Computing Machine Learning
来源出版物2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM)
ISSN2334-0983
出版年2019
ISBN978-1-7281-0962-6
出版者IEEE
类型Proceedings Paper
语种英语
收录类别CPCI-S
WOS记录号WOS:000552238601101
WOS关键词ALLOCATION
WOS类目Computer Science, Information Systems ; Engineering, Electrical & Electronic ; Telecommunications
WOS研究方向Computer Science ; Engineering ; Telecommunications
资源类型会议论文
条目标识符http://119.78.100.177/qdio/handle/2XILL650/370059
作者单位[Lu, Jingyuan; Guo, Enting] Nanjing Univ Posts & Telecommun, Nanjing, Peoples R China; [Li, Peng] Univ Aizu, Aizu Wakamatsu, Fukushima, Japan; [Feng, Huibin] Minjiang Univ, Fuzhou, Peoples R China; [Wang, Kun] Univ Calif Los Angeles, Los Angeles, CA USA; [Wang, Xiaoyan] Ibaraki Univ, Ibaraki, Japan; [Guo, Song] Hong Kong Polytech Univ, Hong Kong, Peoples R China
推荐引用方式
GB/T 7714
Lu, Jingyuan,Li, Peng,Wang, Kun,et al. Topology-Aware Job Scheduling for Machine Learning Cluster[C]:IEEE,2019.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Lu, Jingyuan]的文章
[Li, Peng]的文章
[Wang, Kun]的文章
百度学术
百度学术中相似的文章
[Lu, Jingyuan]的文章
[Li, Peng]的文章
[Wang, Kun]的文章
必应学术
必应学术中相似的文章
[Lu, Jingyuan]的文章
[Li, Peng]的文章
[Wang, Kun]的文章
相关权益政策
暂无数据
收藏/分享

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。