# PODS Alberto O. Mendelzon Test-of-Time Award

In 2007, the PODS Executive Committee established a Test-of-Time Award, named after the late Alberto O. Mendelzon, in recognition of his scientific legacy and his service and dedication to the database community. Mendelzon was an international leader in database theory, whose pioneering and fundamental work has inspired and influenced both database theoreticians and practitioners, and continues to be applied in a variety of advanced settings. He served the database community in many ways: he served as both the Program and the General Chair of the PODS conference, and was instrumental in bringing SIGMOD and PODS together. He was an outstanding educator, who guided the research of numerous doctoral students and postdoctoral fellows. The Award is to be given each year to a paper or a small number of papers published in the PODS proceedings ten years prior, that had the most impact (in terms of research, methodology, or transfer of practice) over the intervening decade. The decision was approved by SIGMOD and ACM. The funds for the Award were contributed by IBM Toronto.

The PODS Executive Committee has appointed us to serve as the Award Committee for 2018. After careful consideration and having solicited external nominations and advice, we have selected the following paper as the award winner for 2018:

## Talk

### The Chase Revisited

**Authors: Alin Deutsch, Alan Nash, and Jeff Remmel**

**Speaker: Alin Deutsch (UC San Diego)**

### Abstract

The chase procedure, introduced in the '70s, is a famous technique in the field and has been proved to be important and effective in providing solutions to several problems related to reasoning on data. The paper revisits the standard chase procedure, studying its properties and applicability to classical database problems. Beside settling the open problem of decidability of termination of the standard chase, it investigates the adequacy of the standard chase for a number of dataoriented tasks. The conceptual insight provided by the paper and the technical results presented go much deeper than the modest title of the paper may suggest. They have had a huge impact on the research work carried out in several topics of data management and knowledge bases, including checking query containment under constraints, constraint implication, computing certain answers in data exchange and data integration, query answering in Datalog and its extensions, and ontology-based data access.

### Bio

**Alin Deutsch** is a professor of Computer Science and Engineering at UC San Diego. His research is motivated by the data management challenges
raised by database-powered applications. Alin's projects include query language design and optimization for various data models ranging from text
to the relational and post-relational models (with particular emphasis on semistructured data). He also works on cross-model data integration and
on automatic verification of business processes. Alin earned his PhD in Computer Science in 2002 from the University of Pennsylvania working with advisor Val Tannen who taught him how to
chase his dreams. Prior to that, Alin received an MSc degree from the Technical University of Darmstadt (Germany) and a BSc degree from the
Polytechnic University Bucharest (Romania). Alin is the recipient of a Jean D'Alembert Fellowship, an Alfred P. Sloan Fellowship, and an NSF
CAREER award.

**Alan Nash (1965-2009)** earned his PhD at UC San Diego in 2006, under the direction of co-advisors Jeff Remmel, Russell Impagliazzo, and
Victor Vianu. This PhD was awarded by both the department of Mathematics and the department of Computer Science and Engineering. This was
the first-ever multi-departmental UCSD PhD degree.
A scientist, entrepreneur, surfer, and yoga teacher, Alan had enormous energy and a brilliant mind. His trajectory was highly unconventional. Born
in Argentina, he moved to the US straight after high school and worked in industry, rising to the position of Assistant VP at Merrill Lynch, then
founder and CEO of the educational software startup Tesuji, and Chief Scientist at Systran Software. All the while, Alan pursued on his own a
private passion for mathematics, especially set theory and logic. This led him to take graduate classes in mathematics through UCSD Extension in
2000, and meet Jeff Remmel. Jeff was instrumental in getting Alan, who had never attended college, to initially join the Math, and later also the CSE
PhD program.
After graduation, Alan joined IBM Research at Almaden. In 2008, he decided to return to his entrepeneurial roots and joined Tradeworx, a quantitative hedge fund, as
VP of investment research. Soon after, Alan founded jointly with Tradeworx the hedge fund Aleph One LLC, based in La Jolla. Alan is remembered not only as a brilliant
researcher, but also as a fascinating person who always sought his own path.

**Jeffrey ("Jeff") Remmel (1948-2017)** was a prolific researcher in mathematics and computability, deeply committed to education at all levels.
Jeff earned his Ph.D. in 1974 at Cornell University working with Anil Nerode, and he immediately joined the UC San Diego Department of
Mathematics, where he spent his entire career. His early work with Nerode initiated the study of the computational complexity of mathematical
structures. Jeff's research covered many topics in algebraic combinatorics, especially on enumerative combinatorics, tableaux theory, shuffle
conjectures and permutation theory. Jeff accomplished foundational work in knowledge representation on the complexity of nonmonotonic logics
and answer set programming. Jeff had well over 300 publications in conferences and journals with over 90 co-authors, and he was the co-editor of
the "Handbook of Recursive Mathematics" and co-author of a graduate text on symmetric functions.
Jeff was a highly dedicated teacher and he advised more than 30 Ph.D. students. Jeff served as the Chair of the Department of Mathematics (1998-
2002) and as Associate Dean for the Division of Physical Sciences (2000-2016). His legacy continues to touch the lives of many, beyond research
and academia: from 2005 onwards, Jeff served as the founding director of the Science and Math Teacher Initiative (CalTeach), which prepares undergraduates to teach
mathematics and science at the middle and high school levels. Hundreds of UC San Diego students completed the CalTeach program and received California credentials
for science and mathematics education.

**Alberto O. Mendelzon Test-of-Time Award Committee for 2018:**

- Maurizio Lenzerini (Chair, Universita di Roma La Sapienza)
- Wim Martens (University of Bayreuth)
- Nicole Schweikardt (Humboldt-Universitat zu Berlin)

# PODS Best Paper

### Entity Matching with Active Monotone Classification

**Speaker: Yufei Tao (Chinese University of Hong Kong)**

### Abstract

Given two sets of entities X and Y, entity matching aims to decide whether x and y represent the same entity for each pair (x, y) ∈ X × Y. As the last resort, human experts can be called upon to inspect every (x, y), but this is expensive because the correct verdict could not be determined without investigation efforts dedicated specifically to the two entities x and y involved. It is therefore important to design an algorithm that asks humans to look at only some pairs, and renders the verdicts on the other pairs automatically with good accuracy. At the core of most (if not all) existing approaches is the following classification problem. The input is a set P of points in R^{d}, each of which carries a binary label: 0 or 1. A classifier F is a function from R^{d} to {0,1}. The objective is to find a classifier that captures the labels of a large number of points in P.

In this paper, we cast the problem as an instance of active learning where the goal is to learn a monotone classifier F, namely, F(p) ≥ F(q) holds whenever the coordinate of p is at least that of q on all dimensions. In our formulation, the labels of all points in P are hidden at the beginning. An algorithm A can invoke an oracle, which discloses the label of a point p ∈ P chosen by A. The algorithm may do so repetitively, until it has garnered enough information to produce F. The cost of A is the number of times that the oracle is called. The challenge is to strike a good balance between the cost and the accuracy of the classifier produced. We describe algorithms with non trivial guarantees on the cost and accuracy simultaneously. We also prove lower bounds that establish the asymptotic optimality of our solutions for a wide range of parameters.

### Bio

**Yufei Tao** is a Professor in the Department of Computer Science and Engineering, Chinese University of Hong Kong. He served as an associate
editor of ACM Transactions on Database Systems (TODS) from 2008 to 2015, and of IEEE Transactions on Knowledge and Data Engineering
(TKDE) from 2012 to 2014. He served as a PC co-chair of International Conference on Data Engineering (ICDE) 2014. He gave a keynote speech
at International Conference on Database Theory (ICDT) 2016. He received the best-paper awards at SIGMOD in 2013 and 2015, respectively, and
a Google Faculty Research Award in 2017. He is an ACM distinguished scientist. His current research aims to develop "small-and-sweet"
algorithms: (i) small: easy to implement for deployment in practice, and (ii) sweet: having non-trivial theoretical guarantees.