TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval

1 minute read

Introduction

  • What is the name of the TABi paper?

    TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval

  • What is the main contribution of the TABi paper?

    TL;DR: State-of-the-art retrievers for open-domain natural language processing (NLP) tasks can exhibit popularity biases and fail to retrieve rare entities. We introduce a method to improve retrieval of rare entities by incorporating knowledge graph types through contrastive learning

  • A major long-standing challenge for entity retrieval is retrieving the long tail
    • rare entities are unlikely to be retrieved
    • still a problem with modern contrastive learning approaches
  • Long tail of information retrieval: example disambiguation
    • long tail of other Julius Caesars

    Untitled

  • Long tail entity retrieval can be improved by using entity type information from a knowledge graph

  • Contrastive Methods: popularity bias: Query embeddings can be closest to the most popular entity embeddings for the mention in the query, regardless of the context

Method

  • TABi (Type-Aware Bi-encoders) adds knowledge graph types as textual input to entity encoder

  • TABi contrastive loss pulls queries of the same entity type together
    • type aware loss term

    Untitled

  • TABi type aware loss term positive pairs are entities of the same type in a batch

    Untitled

  • TABi adds special tokens around entity mention boundaries

Results

  • TABi results in better entity clustering

    Untitled

  • TABI: good results on AMBER and KILT datasets

    Untitled

    Untitled

  • TABi robustness: can perform well with large percentage of incorrect types

    Untitled

  • New direction in type aware contrastive learning:

    • inject data from other structured sources besides knowledge graphs
    • incorporate expert knowledge to determine types

Reference

@misc{https://doi.org/10.48550/arxiv.2204.08173,
  doi = {10.48550/ARXIV.2204.08173},
  
  url = {https://arxiv.org/abs/2204.08173},
  
  author = {Leszczynski, Megan and Fu, Daniel Y. and Chen, Mayee F. and RĂ©, Christopher},
  
  keywords = {Computation and Language (cs.CL), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval},
  
  publisher = {arXiv},
  
  year = {2022},
  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Updated: