|
Introduction
|
ImageCLEFphoto 2007 provides the system-centred evaluation for multilingual visual
information retrieval from generic photographic collection (i.e. containing
everyday real-world photographs akin to those that can frequently be found
in private photographic collections as well).
The evaluation scenario is thereby is similar to the classic TREC ad-hoc retrieval
task: simulation of the situation in which a system knows the set of documents
to be searched, but cannot anticipate the particular topic that will be
investigated (i.e. topics are not known to the system in advance).
The goal of the simulation is: given an alphanumeric statement (and/or
sample images) describing a user information need, find as many relevant images as
possible from the
IAPR
TC-12 photographic collection (with the query language either being identical
or different from that used to describe the images).
Any method can be used to retrieve relevant documents and we encourage the use
of both concept-based and content-based retrieval methods. This is an
ImageCLEF task.
|
|
Data Collection
|
The image
collection of the IAPR TC-12 Benchmark consists of 20,000 still natural
images (plus 20,000 corresponding thumbnails) taken from locations around the
world and comprising an assorted cross-section of still natural images. This
includes pictures of different sports and actions, photographs of people,
animals, cities, landscapes and many other aspects of contemporary
life.
Each image is associated with an alphanumeric caption stored in a semi-structured
format. These captions include the title of the image, its creation date,
the location at which the photograph was taken, the name of the photographer, a
semantic description of the contents of the image (as determined by the photographer)
and additional notes.
 |
<DOC>
<DOCNO>annotations/00/60.eng</DOCNO>
<TITLE>Palma </TITLE>
<DESCRIPTION>two lane street with large shops on the right and
smaller shops on the left; people are walking on the sidewalk, some
are crossing the street; cars are parked along the left side of the
street as well; </DESCRIPTION>
<NOTES>The main shopping street in Paraguay; </NOTES>
<LOCATION>Asunción, Paraguay </LOCATION>
<DATE>March 2002 </DATE>
<IMAGE>images/00/60.jpg </IMAGE>
<THUMBNAIL>thumbnails/00/60.jpg </THUMBNAIL>
</DOC>
|
The following publication elaborates on the history, design and
implementation of this image collection:
Grubinger, M., Clough, P., Müller, H. and Deselaers, T. (2006),
The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information
Systems, In Proceedings of International Workshop OntoImage2006
Language Resources for Content-Based Image Retrieval, held in
conjuction with LREC'06, pages 13-23, Genoa, Italy, 22 May 2006
(pdf).
Further information about the image collection and links to related publications can be found here.
Based on the feedback from participants of previous evaluation tasks,
the following will be provided for ImageCLEFphoto 2007:
-
Annotation Language: four sets of annotations in (1) English, (2) German,
(3) Spanish and (4) one set whereby the annotation language was randomly selected
for each of the images.
-
Caption Fields: only the fields for the title, location, date and additional
notes are provided.
-
Annotation Completeness: each image contains the same level of annotation
completeness - there are no images without annotations.
|
|
Evaluation Objective
|
Providing only a subset of the annotations creates a new challenge for 2007: the evaluation of multilingual visual information retrieval from a generic collection of lightly annotated photographs. This allows for the investigation of the following research questions:
-
are traditional text retrieval methods still applicable for such short captions?
-
how significant is the choice of the retrieval language?
-
how does retrieval performance compare to retrieval from fully annotated images? (compared to 2006)
-
has retrieval performance improved in comparison with retrieval from lightly annotated images? (compared to 2006)
Since the involvement of visual retrieval techniques becomes more important in this task, we hope to attract more visually oriented approaches (compared to the mainly concept-based retrieval approaches in previous years).
|
|
Query Topics
|
-
Download standard ad-hoc topics for 2007
here.
For this task, we provide a list of topic statements together with sample images
expressing realistic user information needs for visual information retrieval
from the IAPR TC-12 photographic collection. The creation of these topics
has been based on several factors including:
-
the analysis of a log file from online-access to the image collection
-
knowledge of the contents of the image collection
-
various types of linguistic and pictorial attributes such as visual vs. semantic, specific vs. general objects or the use of proper names.
-
the estimated difficulty of the topic.
Similar to TREC, we also provide the query topics as structured statements of user
needs which consist of a title (a short sentence or phrase
describing the search request in a few words), and three sample images (which have been removed from the image collection) that are relevant to that search request. An example for English is the following:
<top>
<num> Number: 1 </num>
<title> accommodation with swimming pool </title>
<narr> </narr>
<image> 3793.jpg </image>
<image> 6321.jpg </image>
<image> 6395.jpg </image>
</top>
Note:
-
we will re-use a subset of the topics from 2006 (for comparison with retrieval results from 2006)
-
we only offer languages that were also offered in 2006: English, German, Spanish, Italian, French, Portuguese, Chinese, Japanese, Russian, Polish, Swedish, Finnish, Norwegian, Danish, and Dutch. Should a participant want to investigate any other language that is not mentioned here, please contact the task organisers by 30 April 2007 to arrange for a translation.
-
participants only receive topic titles, but no narrative descriptions to avoid confusion (they only serve to unambiguously define what constitutes a relevant images or not).
-
participants will also receive three sample images for each topic. These images have been removed from the collection and do not form a part of the ground-truth.
|
|
Retrieval Experiments
|
Experiments are performed as
follows: the participants are given topics, these are used to create a query which
is used to perform retrieval on the image collection. This process iterates
(e.g. maybe involving relevance feedback) until they are satisfied with their
runs. Participants might try different methods to increase the number of relevant
in the top N rank positions (e.g. query expansion), and they can repeat these
different methods for each query (or source) and collection (or target) language.
Participants are free to experiment with whatever methods
they wish for CLIR and image retrieval, e.g. query expansion based on thesaurus
lookup or relevance feedback, indexing and retrieval on only part of the image
caption, different models of retrieval, different translation resources (e.g.
dictionary-based vs. machine translation), and combining text and content-based methods for retrieval. Given the many different possible approaches which could be used to
perform the ad-hoc retrieval, rather than list all of these we will ask participants to
indicate which of the following applies to each of their runs (we
consider these the "main" dimensions which define the query for this ad-hoc
task):
|
Dimension
|
Available Codes
|
|
Topic language
|
DA, DE, EN, ES, FI, FR, IT, JA, NL, NO, PL, PT, RU, SV, ZHS, ZHT
|
|
Annotation language
|
DE, EN, ES, RND, ALL
|
|
Query/run type
|
AUTO, MAN
|
|
Feedback/expansion
|
FB, QE, FBQE, NOFB
|
|
Modality
|
IMG, TXT, TXTIMG
|
Query language
Used to specify the query language used in the run.
The following language codes should be used to indicate the query language:
English (EN), German (DE), French (FR), Portuguese (PT), Spanish (ES), Italian
(IT), Finnish (FI), Japanese (JA), Chinese-simplified (ZHS),
Chinese-traditional (ZHT), Polish (PL), Norwegian (NO), Swedish (SV), Russian
(RU), Danish (DA) and Dutch (NL).
Annotation language
Used to specify the target language (i.e. the annotation set) used for the run:
German (DE), English (EN), Spanish (ES), random (RND).
You can also use all languages in one run (ALL).
Query/run type
We distinguish between manual (MAN) and automatic (AUTO) submissions.
Automatic runs will involve no user interaction; whereby manual runs are those
in which a human has been involved in query construction and the iterative
retrieval process, e.g. manual relevance feedback is performed. We encourage
groups who want to investigate manual intervention further to participate in
the interactive evaluation (iCLEF)
task.
Feedback or Query Expansion
Used to specify whether the run involves query expansion (QE) or feedback
(FB) techniques, both of them (QEFB) or none of them (NOFB).
Modality
This describes the use of visual (image) or text features in your
submission. A text-only run will have modality text (TXT); a purely visual
run will have modality image (IMG) and a combined submission (e.g. initial
text search followed by a possibly combined visual search) will have modality
text+image (TXTIMG).
|
|
Submission format and guidelines
|
What to submit
Participants are required
submit a baseline run which can be used to compare their other submissions. There should be one baseline run for each annotation language (please include monolingual runs in your submission: English-English, Spanish-Spanish and German-German), and according to the previous table these would be classified/identified as:
-
EN-EN-AUTO-NOFB-TXT for the English-English monolingual run
-
ES-ES-AUTO-NOFB-TXT for the Spanish-Spanish monolingual run
-
DE-DE-AUTO-NOFB-TXT for the German-German monolingual run
It is extremely important
that we can get a description of the techniques used for each submitted run. This
should be as detailed as possible to ease the comparison or classification of
techniques and results.
Submission format
Participants are required to submit ranked lists of (up to) the top 1000 images
ranked in descending order of similarity (i.e. the highest nearer the top of
the list). The format of submissions for this ad-hoc
task can be found here and the filenames
should distinguish different types of submission according to the table above. Participants can submit (via email) as many system runs as they
require.
Please note that there
should be at least 1 document entry in your results for each topic (i.e. if
your system returns no results for a query then insert a dummy entry, e.g. 25 1
16/16019 0 4238 xyzT10af5 ). The reason for this is to make sure
that all systems are compared with the same number of topics and relevant
documents. Submissions not following the required format will not be evaluated.
|
|
Result Generation
|
Relevance Assessments
-
Download relevance assessments (qrels) for ImageCLEFphoto 2007
here.
In the past, relevance assessments have been performed by students and staff at the
University of Sheffield and Victoria University. Submissions are used to create
image pools which are judged for relevance by assessors. The pools are assessed and
completed using interactive search and judge (ISJ) to find further relevant images,
and the end result is a set of relevance assessments called qrels. These are
then used to evaluate system performance and compare submissions.
For more information about this procedure and the qrels sets see
the following publications: "The CLEF 2003
Cross Language Image Retrieval Track" and "The CLEF 2004
Cross Language Image Retrieval Track"
Relevance assessment for the more general topics is based
entirely on the visual content of images (e.g. aircraft on the ground).
However, certain topics also require the use of the caption to make a confident
decision (e.g. "pictures of beaches in northern Peru"). What constitutes a
relevant image is a subjective decision, but typically a relevant image will
have the subject of the topic in the foreground, the image will not be too dark
in contrast, and maybe the caption confirms the judge's decision.
The assessment of images in
ImageCLEFphoto is based on using a ternary classification scheme: (1) relevant,
(2) partially relevant and (3) not relevant. The aim of the ternary scheme is
to help assessors in making their relevance judgements more accurate (e.g. an
image is definitely relevant in some way, but maybe the query object is not
directly in the foreground: it is therefore considered partially relevant).
Various combinations of assessor judgements are used to create the qrels sets
and more information can be found from the links given above.
Performance Measures and Results
-
Download the results for ImageCLEFphoto 2007
here.
The ranked lists (runs) submitted by the participants will be evaluated using
trec_eval. We are planning to use the following performance measures to compare
the retrieval results:
-
Mean Average Precision (MAP) - the leading measure like in previous evaluations (for comparison)
-
Precision at 20 documents retrieved (P20) - most internet search engines show 20 images on their first page
-
Geometric Mean Average Precision (GMAP) - to avoid that easy topics mask the bad performance of hard topics
-
Binary Preference (BPREF) - to verify the completeness of the relevance assessments
|
|
Provided Data and Systems
|
Training data
Unfortunately, we do not currently have any training
data available. We suggest you create your own topics and generate relevance
assessments similar to the topics provided.
CBIR Systems
To enable participation to the ad hoc task to
those without access to their own CBIR system, we suggest using one of the
following systems:
The GIFT/Viper image retrieval system
(please contact
Henning Müller
for more information).
The FIRE image retrieval system
(please contact
Thomas Deselaers
for more information).
|
|
Paper Submission and Format
|
Paper Submission
Participating groups that have submitted at least one run are invited to describe their approaches and corresponding results as well as their experience with ImageCLEFphoto 2007 in the CLEF Working Notes. Working papers have to be emailed to Allan Hanbury (hanbury@prip.tuwien.ac.at) not later than August 17.
The full Working Notes will be prepared in digital form only and will be posted on the CLEF website and published in the DELOS Digital Library one week before the workshop together with the run statistics. At the workshop we also intend to distribute a printed set of abstracts together with CDs containing the entire Working Notes. In order to make life as easy as possible for participants, we will extract the abstract from the text of the submitted papers. This means that participants should try to make the abstract as complete as possible: it should provide the main details of the retrieval experiments including tasks performed, approach used, resources employed, and results obtained.
Submission Format
The submitted papers should not exceed 10 pages and should follow the guidelines desrcibed here.
It is recommended that you use LaTeX to prepare the text (a LaTeX template can be found here). Further, the name of your submitted file should be: [last-name-of-first-authorCLEF2007], e.g. grubingerCLEF2007.pdf.
|
|
Important Dates
|
|
16 April 2007:
|
Data Release
|
|
06 May 2007:
|
Topic Release
|
|
11 June 2007:
|
Submission of retrieval runs due (EXTENDED!!!)
|
|
16 July 2007:
|
Release of retrieval results
|
|
17 August 2007:
|
Workshop papers due
|
|
19-21 September 2007:
|
CLEF 2007 in Budapest, Hungary
|
|
|
Mailing List
|
We have set up a mailing list: imageclef@sheffield.ac.uk for participants.
Please contact Paul Clough to be added to the list.
|
|