Knowledge Resource Center for Ecological Environment in Arid Area
Topology-Aware Job Scheduling for Machine Learning Cluster | |
Lu, Jingyuan; Li, Peng; Wang, Kun; Feng, Huibin; Guo, Enting; Wang, Xiaoyan; Guo, Song | |
通讯作者 | Lu, JY (corresponding author), Nanjing Univ Posts & Telecommun, Nanjing, Peoples R China. |
会议名称 | IEEE Global Communications Conference (GLOBECOM) |
会议日期 | DEC 09-13, 2019 |
会议地点 | Waikoloa, HI |
英文摘要 | Parameter Server (PS) has been widely used to train a large amount of data on multiple machines in parallel. In parameter server, a critical problem is how to effectively schedule multiple training jobs to minimize the job completion time. Some existing work has proposed methods of setting the number of concurrent workers. However, they do not effectively consider the topology of GPU placement which affects the efficiency of communication. This paper proposes a novel resource-to-time model based on the number of workers and the topology of GPU placement. According to the model, we propose an algorithm called TOPO-PS particularly for topology problem in parameter servers. The algorithm achieves the placement strategy based on graph mapping algorithm. Evaluation under various algorithms evidences the superiority of our algorithm. TOPO-PS yields shorter job completion, by up to 53.48% of that of FIFO and 88.77% of OASIS. |
英文关键词 | Parameter Server Scheduling Algorithms Cloud Computing Machine Learning |
来源出版物 | 2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM) |
ISSN | 2334-0983 |
出版年 | 2019 |
ISBN | 978-1-7281-0962-6 |
出版者 | IEEE |
类型 | Proceedings Paper |
语种 | 英语 |
收录类别 | CPCI-S |
WOS记录号 | WOS:000552238601101 |
WOS关键词 | ALLOCATION |
WOS类目 | Computer Science, Information Systems ; Engineering, Electrical & Electronic ; Telecommunications |
WOS研究方向 | Computer Science ; Engineering ; Telecommunications |
资源类型 | 会议论文 |
条目标识符 | http://119.78.100.177/qdio/handle/2XILL650/370059 |
作者单位 | [Lu, Jingyuan; Guo, Enting] Nanjing Univ Posts & Telecommun, Nanjing, Peoples R China; [Li, Peng] Univ Aizu, Aizu Wakamatsu, Fukushima, Japan; [Feng, Huibin] Minjiang Univ, Fuzhou, Peoples R China; [Wang, Kun] Univ Calif Los Angeles, Los Angeles, CA USA; [Wang, Xiaoyan] Ibaraki Univ, Ibaraki, Japan; [Guo, Song] Hong Kong Polytech Univ, Hong Kong, Peoples R China |
推荐引用方式 GB/T 7714 | Lu, Jingyuan,Li, Peng,Wang, Kun,et al. Topology-Aware Job Scheduling for Machine Learning Cluster[C]:IEEE,2019. |
条目包含的文件 | 条目无相关文件。 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。