Education

  • Ph.D. in Computational Linguistics   2017 -- 2023

    Georgetown University

  • M.S. in Computational Linguistics   2017 -- 2020

    Georgetown University

  • Ph.D. student in Linguistics   2016 -- 2017

    SUNY - Stony Brook University

  • M.A. in Linguistics   2015 -- 2016

    Leiden University

  • B.A. in Applied Mathematics, French & Linguistics   2011 -- 2015

    University of California - Berkeley

Interests

  • Corpus Linguistics
  • Computational Linguistics
  • Natural Language Processing
  • Discourse Theories
  • Syntax-Semantics Representations
  • Entities & Coreference
  • Chinese
  • English
  • German

Publications

. GENTLE: A genre-diverse multilayer challenge set for English NLP and linguistic evaluation. In Proceedings of the 17th Linguistics Annotation Workshop (LAW 2023) at ACL 2023, Toronto, Canada, 2023.

PDF Data

. GCDT: A Chinese RST Treebank for Multigenre and Multilingual Discourse Parsing. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (AACL-IJCNLP 2022), pages 382–391, Online only, November 2022. Association for Computational Linguistics.

PDF Slides

. Chinese Discourse Annotation Reference Manual. Research Report, Georgetown University, October 2022

PDF

. DisCoDisCo at the DISRPT2021 Shared Task: A System for Discourse Segmentation, Classification, and Connective Detection. In Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021), 51–62, online, 2021.

PDF

. Overview of AMALGUM – Large Silver Quality Annotations across English Genres. In Proceedings of the Society for Computation in Linguistics (SCiL) 2021, 434-437, online, 2021.

PDF

. PASTRIE: A Corpus of Prepositions Annotated with Supersense Tags in Reddit International English. In Proceedings of the 14th Linguistics Annotation Workshop (LAW 2020) at COLING 2020, online, 2020.

PDF

. Tencent submission for WMT20 Quality Estimation Shared Task. In Proceedings of the Fifth Conference on Machine Translation (WMT 2020) at EMNLP 2020, online, 2020.

PDF Slides

. AMALGUM – A Free, Balanced, Multilayer English Web Corpus. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2020), 5267–5275, Marseille, France, 2020.

PDF Code+Data

. A Corpus of Adpositional Supersenses for Mandarin Chinese. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2020), 5986–5994, Marseille, France, 2020.

PDF Data

. Modeling Long-Range Context for Concurrent Dialogue Acts Recognition. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM 2019), 2277–2280, Beijing, China, 2019.

PDF Poster Slides Code

. GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection. In Proceedings of the Workshop on Discourse Relation Parsing and Treebanking (DISRPT 2019) at NAACL-HLT 2019, 133-143, Minneapolis, MN, 2019.

PDF Poster Code

. Adpositional Supersenses for Mandarin Chinese. In Proceedings of the Society for Computation in Linguistics (SCiL 2019), vol. 2, 334–337, New York, NY, 2019.

Preprint Poster

. All roads lead to UD: Converting Stanford and Penn parses to English Universal Dependencies with multilayer annotations. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions at COLING 2018, 167–177, Santa Fe, NM, 2018.

PDF Poster

Other presentations

. Validating and Merging a Growing Multilayer Corpus: the Case of GUM. Abstract presented at the 14th American Association for Corpus Linguistics (AACL 2018) Conference, Atlanta, GA, 2018.

Slides

Teaching

LMU Munich

  • Basismodul Computerlinguistik
    Co-instructorWinter 2023
  • Linguistic Annotation Frameworks
    InstructorSummer 2023

Georgetown University

  • LING-367: Computational Corpus Linguistics
    Teaching AssistantFall 2021
  • LING-001: Introduction to Language
    Teaching AssistantFall 2020
  • LING-462/COSC-482: Statistical Machine Translation
    Teaching AssistantSpring 2020
  • LING-469: Analyzing Language Data with R
    Teaching AssistantSpring 2020
  • LING-469: Analyzing Language Data with R
    Teaching AssistantSpring 2019
  • LING-362: Intro: Natural Language Processing
    Teaching AssistantFall 2018

SUNY - Stony Brook University

  • LIN-230: Languages of the World
    Teaching AssistantSpring 2017
  • LIN-200: Language in the U.S.
    Teaching AssistantFall 2016

Reviewing

  • The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)
  • Widening Natural Language Processing (WiNLP 2022)
  • Jan 2022 ACL Rolling Review
  • Transactions of the Association for Computational Linguistics
  • Widening Natural Language Processing (WiNLP 2021)
  • The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021)
  • The ACL-IJCNLP 2021 Student Research Workshop (ACL-IJCNLP SRW 2021)
  • The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)
  • The NAACL Student Research Workshop (SRW) 2021
  • 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2021)
  • The 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021)
  • Widening Natural Language Processing (WiNLP 2020)
  • The 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020)
  • Widening Natural Language Processing (WiNLP 2019)

Personal