Feature-based Visualization and Analysis of Natural Language Documents (VisADoc)


Prof. Dr. Daniel Keim, Konstanz
Universität Konstanz
Fachbereich Informatik und Informationswissenschaft
Arbeitsgruppe Datenbanken, Datenanalyse und Visualisierung

Prof. Dr. Iryna Gurevych, Darmstadt
TU Darmstadt - FB 20
Hochschulstraße 10


Feature-based Visualization and Analysis of Natural Language Documents (VisADoc)(Publications)


The amount of digital text data, e.g. created by the Web users, has been rapidly growing over the recent years yielding heavy information overload. Feature-based visualization and analysis of the natural language documents is, thus, an urgent need to address this issue. Search engines help the user to find the relevant documents, but do not provide advanced tools for analyzing and understanding the dimensions of text relevant to the users’ need. The major challenge thereby is the gap between automatically computable text features and the above mentioned needs, which have to be bridged to facilitate the user’s interaction with documents, e.g. understanding why two documents are similar, how the documents are related within an automatically computed cluster, or determining the relevant aspects of text quality and age suitability. The proposed research aims at developing new visual analytics techniques for closing this gap. We propose to analyze text according to differe nt aspects determined through automatically computed features and an interactive, visually supported feature engineering approach which allows exploration and evaluation of user-defined text properties1 in large document collections. These features can then be used for advanced text analysis, resulting in an improved effectiveness with higher accuracy. To this end, we investigate novel textual features for modelling content related text properties. A tight integration of automatic text analysis with multidimensional text and feature visualization is crucial to the proposed interactive process. The research is embedded in an end-to-end framework that supports defining text measures according to users’ interests. We evaluate the proposed approach to visual text analysis in the tasks of quality assessment and age suitability.