报告题目:面向千万以上数据量级的并行轨迹相似性连接
主讲嘉宾:商烁
报告时间:2020年3月22日 10:00-12:00
报告地点:复旦大学江湾校区交叉二号楼 A3009室
报告摘要:The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider the case of trajectory similarity join (TS-Join), where the objects are trajectories of vehicles moving in road networks. Thus, given two sets of trajectories and a threshold θ, the TS-Join returns all pairs of trajectories from the two sets with similarity above θ. This join targets applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction.With these applications in mind, we provide a purposeful definition of similarity. To enable efficient TS-Join processing on large sets of trajectories, we develop search space pruning techniques and take into account the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide- and-conquer algorithm. For each trajectory, the algorithm first finds similar trajectories. Then it merges the results to achieve a final result. The algorithm exploits an upper bound on the spatiotemporal similarity and a heuristic scheduling strategy for search space pruning. The algorithm’s per-trajectory searches are independent of each other and can be performed in parallel, and the merging has constant cost. An empirical study with real data offers insight in the performance of the algorithm and demonstrates that is capable of outperforming a well-designed baseline algorithm by an order of magnitude.
关于嘉宾:商烁,电子科技大学教授、博士生导师,国家青年特聘专家、四川省特聘专家、北京市科技新星、北京市优秀人才。曾任阿联酋人工智能研究院资深科学家、数据挖掘研究主任。中国计算机学会数据库专委会委员,中国地理信息产业协会理论与方法委员会委员,教育部-中移动联合实验室评审专家组成员。本科毕业于北京大学,博士毕业于澳大利亚昆士兰大学。研究方向包括大数据、人工智能、智能时空计算、智能舆情分析、社会计算等。在相关领域发表论文 80 余篇,含 CCF A 类论文 40 余篇,ESI 高被引/热点论文 2 篇,SCI 他引 800 余次,谷歌学术引用 2000 余次,单篇最高引用 300 余次。获 WISE 2017 唯一最佳论文奖。担任 8 个国际 SCI 期刊主管客座编委,长期担任 CCF A 类会议 SIGMOD、VLDB、ICDE、KDD、AAAI、IJCAI 程序委员会委员,长期担任 CCF A 类期刊 ACM TODS,ACM TOIS,VLDB Journal, IEEE TKDE 的特邀评审专家。主持国家自然科学基金重点项目 1 项(智能舆情分析)、重点项目课题 2 项(大图数据管理、内存系统结构)。