Arid
DOI10.1145/mod0251
Continuously Adaptive Similarity Search
Zhang, Huayi; Cao, Lei; Yan, Yizhou; Madden, Samuel; Rundensteiner, Elke A.
通讯作者Zhang, HY (corresponding author), Worcester Polytech Inst, Worcester, MA 01609 USA.
会议名称ACM SIGMOD International Conference on Management of Data (SIGMOD)
会议日期JUN 14-19, 2020
会议地点ELECTR NETWORK
英文摘要Similarity search is the basis for many data analytics techniques, including k-nearest neighbor classification and outlier detection. Similarity search over large data sets relies on i) a distance metric learned from input examples and ii) an index to speed up search based on the learned distance metric. In interactive systems, input to guide the learning of the distance metric may be provided over time. As this new input changes the learned distance metric, a naive approach would adopt the costly process of re-indexing all items after each metric change. In this paper, we propose the first solution, called OASIS, to instantaneously adapt the index to conform to a changing distance metric without this prohibitive re-indexing process. To achieve this, we prove that locality-sensitive hashing (LSH) provides an invariance property, meaning that an LSH index built on the original distance metric is equally effective at supporting similarity search using an updated distance metric as long as the transform matrix learned for the new distance metric satisfies certain properties. This observation allows OASIS to avoid recomputing the index from scratch in most cases. Further, for the rare cases when an adaption of the LSH index is shown to be necessary, we design an efficient incremental LSH update strategy that re-hashes only a small subset of the items in the index. In addition, we develop an efficient distance metric learning strategy that incrementally learns the new metric as inputs are received. Our experimental study using real world public datasets confirms the effectiveness of OASIS at improving the accuracy of various similarity search-based data analytics tasks by instantaneously adapting the distance metric and its associated index in tandem, while achieving an up to 3 orders of magnitude speedup over the state-of-art techniques.
英文关键词Distance metric learning Nearest neighbor search LSH
来源出版物SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA
出版年2020
页码2601-2616
ISBN978-1-4503-6735-6
出版者ASSOC COMPUTING MACHINERY
类型Proceedings Paper
语种英语
收录类别CPCI-S
WOS记录号WOS:000644433700171
WOS类目Computer Science, Information Systems
WOS研究方向Computer Science
资源类型会议论文
条目标识符http://119.78.100.177/qdio/handle/2XILL650/353180
作者单位[Zhang, Huayi; Yan, Yizhou; Rundensteiner, Elke A.] Worcester Polytech Inst, Worcester, MA 01609 USA; [Cao, Lei; Madden, Samuel] MIT, CSAIL, Cambridge, MA 02139 USA
推荐引用方式
GB/T 7714
Zhang, Huayi,Cao, Lei,Yan, Yizhou,et al. Continuously Adaptive Similarity Search[C]:ASSOC COMPUTING MACHINERY,2020:2601-2616.
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Zhang, Huayi]的文章
[Cao, Lei]的文章
[Yan, Yizhou]的文章
百度学术
百度学术中相似的文章
[Zhang, Huayi]的文章
[Cao, Lei]的文章
[Yan, Yizhou]的文章
必应学术
必应学术中相似的文章
[Zhang, Huayi]的文章
[Cao, Lei]的文章
[Yan, Yizhou]的文章
相关权益政策
暂无数据
收藏/分享

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。