Latent Entity Space: A Novel Retrieval Approach for Entity-Bearing Queries by Xitong Liu and Hui Fang, in Information Retrieval, 2015. [PDF]

The dataset is packed in a single tar ball. It includes four collections:

  • 94: 94 queries with FACC entity annotation from TREC 2009 to 2012 Web track.
  • eqfe: 191 queries with FACC entity annotation from Entity Query Feature Expansion.
  • 2013: 50 queries with manual entity annotation from TREC 2013 Web Track.
  • 2014: 50 queries with manual entity annotation from TREC 2014 Web Track.

Each collection includes queries, qrels and retrieval run files reported in the paper:

├── eval
│   ├── gdeval.pl
│   └── qrels
├── query
│   └── qent.map
└── ret
    ├── les-col
    └── les-fb

Query

qent.map includes the entity annotations in the query:

201 : /m/0gmg36g : Raspberry Pi
202 : /m/01c16w : USS Carl Vinson (CVN-70)
203 : /m/0h51_mt : Les Miserables
204 : /m/037hz : Golf
205 : /m/02pqcrl : Charity
...

Each record is divided by : into three columns:

  • query ID
  • Freebase MID
  • Surface text in the query

Note:

  • One query may have multiple entity mappings.
  • The original queries can be obtained from TREC.

Run files

les-col is the run file from Latent Entity Space model with entity profile estimated based on document collection, and les-fb is the run file with entity profile estimated from Freebase.

201 Q0 clueweb12-0900tw-63-01604 1 996.300000 les_intep
201 Q0 clueweb12-1700tw-60-14505 2 993.300000 les_intep
201 Q0 clueweb12-1600tw-25-03206 3 992.900000 les_intep
...

Each record is divided by space into three columns:

  • query ID
  • Q0, reserved
  • document ID
  • document rank
  • document score
  • run ID

Note:

  • For each query, only 20 results are returned, as the evaluation metrics estimate relevance at top 20.
  • The retrieval score is for ranking only and do not bear any semantic meaning.

QRELS (Ground Truth)

qrels is the Qref file retrieved from TREC official data release. gdeval.pl is the evaluation script which reports nDCG@20 and ERR@20.

201 0 clueweb12-0000tw-05-12114 1
201 0 clueweb12-0000wb-30-01951 0
201 0 clueweb12-0000wb-60-01497 1
...
202 0 clueweb12-0001wb-27-33452 0
202 0 clueweb12-0002wb-14-02885 0
...

Each record is divided by : into five columns:

  • query ID
  • 0, reserved
  • document ID
  • flag for document relevance

Citation

For citation, please use the following bibtex:

@article{Liu:2015,
  title = "{Latent Entity Space: A Novel Retrieval Approach for Entity-Bearing Queries}",
  author = {Liu, Xitong and Fang, Hui},
  journal = {Information Retrieval},
  volume={18},
  number={6},
  pages={473--503},
  year = {2015},
}

Download

The data set could be downloaded as a single tar ball.