Return to search results
KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS
KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS
BOLIN DING*, YINTAO YU*, BO ZHAO*, CINDY XIDE LIN*, JIAWEI HAN*, AND CHENGXIANG ZHAI*
Abstract. We study the problem of keyword search in a data cube with text-rich dimension(s)
(so-called text cube). The text cube is built on a multidimensional text database, where each row
is associated with some text data (e.g., a document) and other structural dimensions (attributes).
A cell in the text cube aggregates a set of documents with matching attribute values in a subset
of dimensions. A cell document is the concatenation of all documents in a cell. Given a keyword
query, our goal is to find the top-k most relevant cells (ranked according to the relevance scores of
cell documents w.r.t. the given query) in the text cube.
We define a keyword-based query language and apply IR-style relevance model for scoring and
ranking cell documents in the text cube. We propose two efficient approaches to find the top-k
answers. The proposed approaches support a general class of IR-style relevance scoring formulas
that satisfy certain basic and common properties. One of them uses more time for pre-processing
and less time for answering online queries; and the other one is more efficient in pre-processing and
consumes more time for online queries. Experimental studies on the ASRS dataset are conducted
to verify the efficiency and effectiveness of the proposed approaches.
Complete Metadata
| @type | dcat:Dataset |
|---|---|
| accessLevel | public |
| accrualPeriodicity | irregular |
| bureauCode |
[
"026:00"
]
|
| contactPoint |
{
"fn": "Elizabeth Foughty",
"@type": "vcard:Contact",
"hasEmail": "mailto:elizabeth.a.foughty@nasa.gov"
}
|
| description | KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS BOLIN DING*, YINTAO YU*, BO ZHAO*, CINDY XIDE LIN*, JIAWEI HAN*, AND CHENGXIANG ZHAI* Abstract. We study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (e.g., a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. A cell document is the concatenation of all documents in a cell. Given a keyword query, our goal is to find the top-k most relevant cells (ranked according to the relevance scores of cell documents w.r.t. the given query) in the text cube. We define a keyword-based query language and apply IR-style relevance model for scoring and ranking cell documents in the text cube. We propose two efficient approaches to find the top-k answers. The proposed approaches support a general class of IR-style relevance scoring formulas that satisfy certain basic and common properties. One of them uses more time for pre-processing and less time for answering online queries; and the other one is more efficient in pre-processing and consumes more time for online queries. Experimental studies on the ASRS dataset are conducted to verify the efficiency and effectiveness of the proposed approaches. |
| distribution |
[
{
"@type": "dcat:Distribution",
"title": "Paper 12 .pdf",
"format": "PDF",
"mediaType": "application/pdf",
"description": "KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS",
"downloadURL": "https://c3.nasa.gov/dashlink/static/media/publication/Paper_12_.pdf"
}
]
|
| identifier | DASHLINK_234 |
| issued | 2010-10-13 |
| keyword |
[
"ames",
"dashlink",
"nasa"
]
|
| landingPage | https://c3.nasa.gov/dashlink/resources/234/ |
| modified | 2025-04-01 |
| programCode |
[
"026:029"
]
|
| publisher |
{
"name": "Dashlink",
"@type": "org:Organization"
}
|
| title | KEYWORD SEARCH IN TEXT CUBE: FINDING TOP-K RELEVANT CELLS |