Tutorials & Workshops
For the first time in its history, the 30th annual European Conference on Information Retrieval (ECIR 2008) will include a program of three half-day tutorials and three full-day workshops.
|Tutorials Schedule||Workshops Schedule|
Advanced language modeling approaches (Case study: Expert search) (09:00 - 12:30, Sunday, 30th March 2008) -- SAW 404 Level 4 [pdf]
Workshop on novel methodologies for evaluation in information retrieval (09:00 - 17:30, Sunday, 30th March 2008) -- SAW 423 Level 4 [pdf]
Search and Discovery in User-Generated Text Content (14:00 - 17:30, Sunday, 30th March 2008) -- SAW 203 Level 2
Efficiency Issues in Information Retrieval Workshop (09:00 - 17:30, Sunday, 30th March 2008) -- SAW 303 Level 3 [pdf]
Researching and building IR applications using Terrier (14:00 - 17:30, Sunday, 30th March 2008) -- SAW 404 Level 4 [pdf]
Exploiting Semantic Annotations in Information Retrieval (09:00 - 17:30, Sunday, 30th March 2008) -- SAW 422 Level 4 [pdf]
Coffee break 10:30 - 11:00
Lunch break 12:30 - 14:00
Coffee break 15:30 - 16:00
Sunday 30th March 2008 from 08.30-10.30 in Sir Alwyn Williams (SAW) Building (D16 on campus map).
Tutorials & Workshops will take place at the department of Computing Science, Sir Alwyn Williams (SAW) Building (D16 on campus map).
Advanced language modeling approaches (Case study: Expert search)
Djoerd Hiemstra, University of Twente, The Netherlands
This tutorial gives a clear and detailed overview of advanced language modeling approaches and tools, including the use of document priors, translation models, relevance models, parsimonious models and expectation maximization training. Expert search will be used as a case study to explain the consequences of modeling assumptions. PDF file is available for downloading.
Djoerd Hiemstra is assistant professor at the University of Twente. He wrote a Ph.D. thesis on language models for information retrieval and contributed to over 90 research papers in the field of IR. His research interests include formal models of information retrieval, XML retrieval and multimedia retrieval.
Search and Discovery in User-Generated Text Content
Maarten de Rijke, ISLA, University of Amsterdam, The Netherlands
Wouter Weerkamp, ISLA, University of Amsterdam, The Netherlands
We increasingly live our lives online: Blogs, forums, commenting tools, and many other sharing sites offer possibilities to users to make any information available online. For the first time in history, we are able to collect huge amounts of user-generated content (UGC) within "a blink of an eye". The rapidly increasing amount of UGC poses challenges to the IR community, but also offers many previously unthinkable possibilities. In this tutorial we discuss different aspects of accessing (i.e., searching, tracking, and analyzing) UGC. Our focus will be on textual content, and most of the methods that we will consider for ranking UGC (by relevancy, quality, opinionatedness) are based on language modeling.
This tutorial is aimed at people working in the area of IR, and at language technologists with an interest in information access. The level will be introductory, so anyone with at least a basic knowledge of IR and/or text mining should be able to benefit from the tutorial.
Maarten de Rijke is professor of information processing and internet at the Intelligent Systems Lab Amsterdam (ISLA) of the University of Amsterdam. His group has been researching search and discovery tools for UGC for a number of years now, with numerous publications and various demonstrators as tangible outcomes. Wouter Weerkamp is a PhD student at ISLA, working on language modeling and intelligent access to UGC.
Researching and building IR applications using Terrier
Craig Macdonald, University of Glasgow, UK
Ben He, University of Glasgow, UK
This tutorial introduces the main design of an IR system, and uses the Terrier platform as an example of how one should be built. We detail the architecture and data structures of Terrier, as well as the weighting models included, and describe, with examples, how Terrier can be used to perform experiments and extended to facilitate new research and applications.
Craig Macdonald is a PhD research student at the University of Glasgow. His research interests includes Information Retrieval in Enterprise, Web and Blog settings, and has over 20 publications with research based on the Terrier platform. He has been a co-ordinator of the Blog track at TREC since 2006, and is a developer of the Terrier platform.
Ben He is a post-doctoral research assistant at the University of Glasgow. His research interests are centered around document weighting models, and particularly concerned about document length normalisation and query expansion. He has been a developer of the Terrier platform since its initial development and has more than 20 publications performed with Terrier.
Workshop on novel methodologies for evaluation in information retrieval
Mark Sanderson, University of Sheffield, UK
Martin Braschler, Zurich University of Applied Sciences, Switzerland
Nicola Ferro, University of Padova, Italy
Julio Gonzalo, UNED, Spain
Information retrieval is an empirical science; the field cannot move forward unless there are means of evaluating the innovations devised by researchers. However the methodologies conceived in the early years of IR and used in the campaigns of today are starting to show their age and new research is emerging to understand how to overcome the twin challenges of scale and diversity.
The methodologies used to build test collections in the modern evaluation campaigns were originally conceived to work with collections of 10s of thousands of documents. The methodologies were found to scale well, but potential flaws are starting to emerge as test collections grow beyond 10s of millions of documents. Support for continued research in this area is crucial if IR research is to continue to evaluate large scale search.
With the rise of the large Web search engines, some believed that all search problems could be solved with a single engine retrieving from a one vast data store. However, it is increasingly clear that evolution of retrieval is not towards a monolithic solution, but instead to a wide range of solutions tailored for different classes of information and different groups of users or organizations. Each tailored system on offer requires a different mixture of component technologies combined in distinct ways and each solution requires evaluation.
Efficiency Issues in Information Retrieval Workshop
Roi Blanco, Universidade da Coruña, Spain
Fabrizio Silvestri, ISTI CNR, Italy
Today's technological advancements have allowed for vast amounts of information to be widely generated, disseminated, and stored. This exponentially increasing amount of information has rendered the retrieval of relevant information a necessary and cumbersome task. The efficiency of IR systems is of utmost importance, because it ensures that systems scale up to the vast amounts of information needing retrieval. This is an important topic of research for both academic and corporative environments. The efficiency concerns need to be addressed in a principled way, so that they can be adapted to new platforms and environments, such as information retrieval from mobile devices, desktop search, distributed peer to peer, expert search, multimedia retrieval, and so on. Efficiency research over the past years has focused on efficient indexing, storage (compression) and retrieval of data (query processing strategies).
Some of the major goals of the workshop are:
a) to shed light on efficiency-related problems of modern high-scale IR (Web environment, distributed technologies, peer to peer architectures) and new IR environments (Desktop search, Enterprise/Expert search, mobile devices, etc)
b) to foster collaboration between different research groups in order to explore new and disruptive ideas.
Exploiting Semantic Annotations in Information Retrieval
Omar Alonso, A9.com, USA
Hugo Zaragoza, Yahoo! Research Barcelona, Spain
The goal of this workshop is to create a forum for researchers interested in the use of semantic annotations for information retrieval. By semantic annotations we refer to linguistic annotations (such as named entities, semantic classes, etc.) as well as user annotations such as microformats, RDF, tags, etc. We are not interested in the annotations themselves, but on their application to information retrieval tasks such as ad-hoc retrieval, classification, browsing, textual mining, summarization, question answering, etc.
In the recent years there has been a lot of discussion about semantic annotation of documents. There are many forms of annotations and many techniques that identify or extract them. As NLP tagging techniques mature, more and more annotations can be automatically extracted from free text. In particular, techniques have been developed to ground named entities in terms of geo-codes, ISO time codes, Gene Ontology ids, etc. Furthermore, the number of collections which explicitly identify entities is growing fast with Web 2.0 and Semantic Web initiatives.
Despite the growing number and complexity of annotations, and despite the potential impact that these may have in information retrieval tasks, annotations have not yet made a significant impact in Information Retrieval research or applications. Further research is needed before we can unleash the potential of annotations.