Efficient Keyword-Based Search for Top-K Cells in Text Cube
Previous studies on supporting free-form keyword queries over RDBMSs provide users with linked-structures (e.g.,a
set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. In this paper, we study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text
cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. We define a keyword-based query language and an IR-style relevance model for coring/ranking cells in the text cube. Given a keyword query, our goal is to find the top-k
most relevant cells. We propose four approaches, inverted-index one-scan, document sorted-scan, bottom-up
dynamic programming, and search-space
ordering. The search-space ordering
algorithm explores only a small portion of the text cube for finding the top-k
answers, and enables early termination. Extensive experimental studies are conducted to verify the effectiveness and efficiency of the proposed approaches.
Citation: B. Ding, B. Zhao, C. X. Lin, J. Han, C. Zhai, A. N. Srivastava, and N. C. Oza, “Efficient Keyword-Based Search for Top-K Cells in Text Cube,” IEEE Transactions on Knowledge and Data Engineering, 2011.
Complete Metadata
| @type | dcat:Dataset |
|---|---|
| accessLevel | public |
| accrualPeriodicity | irregular |
| bureauCode |
[
"026:00"
]
|
| contactPoint |
{
"fn": "Ashok Srivastava",
"@type": "vcard:Contact",
"hasEmail": "mailto:ashok.n.srivastava@gmail.com"
}
|
| description | Previous studies on supporting free-form keyword queries over RDBMSs provide users with linked-structures (e.g.,a set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. In this paper, we study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. We define a keyword-based query language and an IR-style relevance model for coring/ranking cells in the text cube. Given a keyword query, our goal is to find the top-k most relevant cells. We propose four approaches, inverted-index one-scan, document sorted-scan, bottom-up dynamic programming, and search-space ordering. The search-space ordering algorithm explores only a small portion of the text cube for finding the top-k answers, and enables early termination. Extensive experimental studies are conducted to verify the effectiveness and efficiency of the proposed approaches. Citation: B. Ding, B. Zhao, C. X. Lin, J. Han, C. Zhai, A. N. Srivastava, and N. C. Oza, “Efficient Keyword-Based Search for Top-K Cells in Text Cube,” IEEE Transactions on Knowledge and Data Engineering, 2011. |
| distribution |
[
{
"@type": "dcat:Distribution",
"title": "tkde11topcells.pdf",
"format": "PDF",
"mediaType": "application/pdf",
"description": "tkde11topcells.pdf",
"downloadURL": "https://c3.nasa.gov/dashlink/static/media/publication/tkde11topcells.pdf"
}
]
|
| identifier | DASHLINK_515 |
| issued | 2012-01-27 |
| keyword |
[
"ames",
"dashlink",
"nasa"
]
|
| landingPage | https://c3.nasa.gov/dashlink/resources/515/ |
| modified | 2025-03-31 |
| programCode |
[
"026:029"
]
|
| publisher |
{
"name": "Dashlink",
"@type": "org:Organization"
}
|
| title | Efficient Keyword-Based Search for Top-K Cells in Text Cube |