PolyDPR

1 minute read

Introduction

  • What is the name of the poly-DPR paper?

    Improving Biomedical Information Retrieval with Neural Retrievers

  • What are the main contributions of the poly -DPR paper?

    • each context is represented by k vectors
      • still employ MIPS
    • TempQG: template based question generation method
      • can generate a large number of in domain questions
    • 2 pretraining tasks: ETM and RSM

Method

  • Poly DPR uses colBERT style maxsim

  • TempQG uses a seq2seq model with a passage and template as input

    Untitled

  • TempQG templates are generated by masking low frequency tagged biological entities
    • also do DPR with text input to select

    Untitled

  • PolyDPR pretraining task: Extended Title Mapping retreives an abstract based on title + concatenated top tfidf words

    Untitled

Results

  • Poly-DPR results: best performance with most granular sub document representations
    • obvious tradeoff with time

    Untitled

Conclusions

  • good experimental evidence about using subdocument representations vs encoding entire document
  • Compare against T5query uses in GPL

Reference

@article{DBLP:journals/corr/abs-2201-07745,
  author    = {Man Luo and
               Arindam Mitra and
               Tejas Gokhale and
               Chitta Baral},
  title     = {Improving Biomedical Information Retrieval with Neural Retrievers},
  journal   = {CoRR},
  volume    = {abs/2201.07745},
  year      = {2022},
  url       = {https://arxiv.org/abs/2201.07745},
  eprinttype = {arXiv},
  eprint    = {2201.07745},
  timestamp = {Fri, 21 Jan 2022 13:57:15 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2201-07745.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Updated: