![]() |
![]() |
![]() |
![]() |
TUNA |
![]() |
![]() |
![]() |
![]() |
NEW The TUNA Corpus is now publicly available.
TUNA is a research project funded by the UK's Engineering and Physical Sciences Research Council (EPSRC). It involves a collaboration between the Department of Computing Science, University of Aberdeen, the Open University, and the University of Tilburg. The project started in October 2003, and ended in Feburary 2007.
Natural Language Generation programs generate text from an underlying Knowledge Base. It can be difficult to find a mapping from the information in the Knowledge Base to the words in a sentence. Difficulties arise, for example, when the Knowledge Base uses `names' (i.e., databases keys) that a hearer/reader does not understand. This can happen, for instance, if the Knowledge Base contains an artificial name like `#Jones083', because `Jones' alone is not uniquely distinguishing; it is also true if the Knowledge Base deals with entities for which no names at all are in common usage (e.g., a specific tree or a chair). In all such cases, the program has to "invent" a description that enables the reader to identify the referent. In the case of Mr. Jones, for example, the program could give his name and address; in the case of a tree, some longer description may be necessary (e.g., `the green oak on the corner of ... and ...'. The technical term for this set of problems is Generation of Referring Expressions (GRE). GRE is a key aspect of almost any Natural Language Generation system.
Existing GRE algorithms tend to focus on one particular class of referring expressions, for example conjunctions of atomic or relational properties (e.g., `the black dog', `the book on the table'). Our research is aimed at designing and implementing a new algorithm for the generation of referring expressions that generates appropriate descriptions in a far greater variety of situations than any of its predecessors. The algorithm will be more complete than its predecessors because it is able to construct a greater variety of descriptions (involving negations, disjunctions, relations, vagueness, etc.). The descriptions generated should also be more appropriate (i.e., more natural in the eyes of a human hearer/reader), because the algorithm will be based on empirical studies involving corpora and controlled experiments. Among other things, these empirical studies will address the question under what circumstances the descriptions should be logically under- or overspecific; they will also allow us to prune the search space (i.e., the space of all descriptions) which would otherwise threaten to make the problem intractible. The project combines (psycho)linguistic, computational and logical challenges and should be of interest to people whose intellectual home is in either of these areas.
[top]| Contact | Kees van Deemter | ||||
| Address | Computing Science Department, Meston Building, King's College, University of Aberdeen |
||||
