publications | YANG JANET LIU

2025

preprint

References Matter: Investigating the Impact of Reference Set Variation on Summarization Evaluation

Silvia Casola^*, Yang Janet Liu^*, Siyao Peng^*, Oliver Kraus, Albert Gatt, and 1 more author

2025

(*equal contribution)

Abs PDF

Human language production exhibits remarkable richness and variation, reflecting diverse communication styles and intents. However, this variation is often overlooked in summarization evaluation. While having multiple reference summaries is known to improve correlation with human judgments, the impact of the reference set on reference-based metrics has not been systematically investigated. This work examines the sensitivity of widely used reference-based metrics in relation to the choice of reference sets, analyzing three diverse multi-reference summarization datasets: SummEval, GUMSum, and DUC2004. We demonstrate that many popular metrics exhibit significant instability. This instability is particularly concerning for n-gram-based metrics like ROUGE, where model rankings vary depending on the reference sets, undermining the reliability of model comparisons. We also collect human judgments on LLM outputs for genre-diverse data and examine their correlation with metrics to supplement existing findings beyond newswire summaries, finding weak-to-no correlation. Taken together, we recommend incorporating reference set variation into summarization evaluation to enhance consistency alongside correlation with human judgments, especially when evaluating LLMs.
preprint

Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation

Beiduo Chen, Yang Janet Liu, Anna Korhonen, and Barbara Plank

2025

Abs PDF Code

The recent rise of reasoning-tuned Large Language Models (LLMs)–which generate chains of thought (CoTs) before giving the final answer–has attracted significant attention and offers new opportunities for gaining insights into human label variation, which refers to plausible differences in how multiple annotators label the same data instance. Prior work has shown that LLM-generated explanations can help align model predictions with human label distributions, but typically adopt a reverse paradigm: producing explanations based on given answers. In contrast, CoTs provide a forward reasoning path that may implicitly embed rationales for each answer option, before generating the answers. We thus propose a novel LLM-based pipeline enriched with linguistically-grounded discourse segmenters to extract supporting and opposing statements for each answer option from CoTs with improved accuracy. We also propose a rank-based HLV evaluation framework that prioritizes the ranking of answers over exact scores, which instead favor direct comparison of label distributions. Our method outperforms a direct generation method as well as baselines on three datasets, and shows better alignment of ranking methods with humans, highlighting the effectiveness of our approach.
ACL

Probing LLMs for Multilingual Discourse Generalization Through a Unified Label Set

Florian Eichin^*, Yang Janet Liu^*, Barbara Plank, and Michael A. Hedderich

In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2025

(*equal contribution)

Abs PDF Code Poster

Discourse understanding is essential for many NLP tasks, yet most existing work remains constrained by framework-dependent discourse representations. This work investigates whether large language models (LLMs) capture discourse knowledge that generalizes across languages and frameworks. We address this question along two dimensions: (1) developing a unified discourse relation label set to facilitate cross-lingual and cross-framework discourse analysis, and (2) probing LLMs to assess whether they encode generalizable discourse abstractions. Using multilingual discourse relation classification as a testbed, we examine a comprehensive set of 23 LLMs of varying sizes and multilingual capabilities. Our results show that LLMs, especially those with multilingual training corpora, can generalize discourse information across languages and frameworks. Further layer-wise analyses reveal that language generalization at the discourse level is most salient in the intermediate layers. Lastly, our error analysis provides an account of challenging relation classes.
ACL

Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges

Bolei Ma^*, Yuting Li^*, Wei Zhou^*, Ziwei Gong^*, Yang Janet Liu, and 5 more authors

In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Jul 2025

(*equal contribution)

Abs PDF

Understanding pragmatics—the use of language in context—is crucial for developing NLP systems capable of interpreting nuanced language use. Despite recent advances in language technologies, including large language models, evaluating their ability to handle pragmatic phenomena such as implicatures and references remains challenging. To advance pragmatic abilities in models, it is essential to understand current evaluation trends and identify existing limitations. In this survey, we provide a comprehensive review of resources designed for evaluating pragmatic capabilities in NLP, categorizing datasets by the pragmatic phenomena they address. We analyze task designs, data collection methods, evaluation approaches, and their relevance to real-world applications. By examining these resources in the context of modern language models, we highlight emerging trends, challenges, and gaps in existing benchmarks. Our survey aims to clarify the landscape of pragmatic evaluation and guide the development of more comprehensive and targeted benchmarks, ultimately contributing to more nuanced and context-aware NLP models.

2024

EMNLP

GDTB: Genre Diverse Data for English Shallow Discourse Parsing across Modalities, Text Types, and Domains

Yang Janet* Liu, Tatsuya * Aoyama, Wesley* Scivetti, Yilun* Zhu, Shabnam Behzad, and 4 more authors

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024

(*equal contribution)

Abs PDF Code

Work on shallow discourse parsing in English has focused on the Wall Street Journal corpus, the only large-scale dataset for the language in the PDTB framework. However, the data is not openly available, is restricted to the news domain, and is by now 35 years old. In this paper, we present and evaluate a new open-access, multi-genre benchmark for PDTB-style shallow discourse parsing, based on the existing UD English GUM corpus, for which discourse relation annotations in other frameworks already exist. In a series of experiments on cross-domain relation classification, we show that while our dataset is compatible with PDTB, substantial out-of-domain degradation is observed, which can be alleviated by joint training on both datasets.
CL Journal

eRST: A Signaled Graph Theory of Discourse Relations and Organization

Amir Zeldes, Tatsuya Aoyama, Yang Janet Liu, Siyao Peng, Debopam Das, and 1 more author

Computational Linguistics, Sep 2024

Abs DOI PDF

In this article we present Enhanced Rhetorical Structure Theory (eRST), a new theoretical framework for computational discourse analysis, based on an expansion of Rhetorical Structure Theory (RST). The framework encompasses discourse relation graphs with tree-breaking, non-projective and concurrent relations, as well as implicit and explicit signals which give explainable rationales to our analyses. We survey shortcomings of RST and other existing frameworks, such as Segmented Discourse Representation Theory (SDRT), the Penn Discourse Treebank (PDTB) and Discourse Dependencies, and address these using constructs in the proposed theory. We provide annotation, search and visualization tools for data, and present and evaluate a freely available corpus of English annotated according to our framework, encompassing 12 spoken and written genres with over 200K tokens. Finally, we discuss automatic parsing, evaluation metrics and applications for data in our framework.
LREC-COLING

DISRPT: A Multilingual, Multi-domain, Cross-framework Benchmark for Discourse Processing

Chloé Braud, Amir Zeldes, Laura Rivière, Yang Janet Liu, Philippe Muller, and 2 more authors

In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024

Abs PDF Code

This paper presents DISRPT, a multilingual, multi-domain, and cross-framework benchmark dataset for discourse processing, covering the tasks of discourse unit segmentation, connective identification, and relation classification. DISRPT includes 13 languages, with data from 24 corpora covering about 4 millions tokens and around 250,000 discourse relation instances from 4 discourse frameworks: RST, SDRT, PDTB, and Discourse Dependencies. We present an overview of the data, its development across three NLP shared tasks on discourse processing carried out in the past five years, and the latest modifications and added extensions. We also carry out an evaluation of state-of-the-art multilingual systems trained on the data for each task, showing plateau performance on segmentation, but important room for improvement for connective identification and relation classification. The DISRPT benchmark employs a unified format that we make available on GitHub and HuggingFace in order to encourage future work on discourse processing across languages, domains, and frameworks.

2023

SIGDIAL

What’s Hard in RST Parsing? Predictive Models for Error Analysis

Yang Janet Liu, Tatsuya Aoyama, and Amir Zeldes

In Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue, Sep 2023

Abs PDF Code

Despite recent advances in Natural Language Processing (NLP), hierarchical discourse parsing in the framework of Rhetorical Structure Theory remains challenging, and our understanding of the reasons for this are as yet limited. In this paper, we examine and model some of the factors associated with parsing difficulties in previous work: the existence of implicit discourse relations, challenges in identifying long-distance relations, out-of-vocabulary items, and more. In order to assess the relative importance of these variables, we also release two annotated English test-sets with explicit correct and distracting discourse markers associated with gold standard RST relations. Our results show that as in shallow discourse parsing, the explicit/implicit distinction plays a role, but that long-distance dependencies are the main challenge, while lack of lexical overlap is less of a problem, at least for in-domain parsing. Our final model is able to predict where errors will occur with an accuracy of 76.3% for the bottom-up parser and 76.6% for the top-down parser.
INTERSPEECH

Lightweight and Efficient Spoken Language Identification of Long-form Audio

Winstead Zhu^*, Md Iftekhar Tanveer^*, Yang Janet Liu^*, Seye Ojumu, and Rosie Jones

In Proc. INTERSPEECH 2023, Sep 2023

(*equal contribution)

Abs DOI PDF Blog

State-of-the-art Spoken Language Identification (SLI) systems usually focus on tackling short audio clips, and thus their performance degrade drastically when applied to long-form audio, such as podcast, which poses peculiar challenges to existing SLI approaches due to its long duration and diverse content that frequently involves multiple speakers as well as various languages, topics, and speech styles. In this paper, we propose the first system to tackle SLI for long-form audio using podcast data by training a lightweight, multi-class feedforward neural classifier using speaker embeddings as input. We demonstrate that our approach can make inference on long audio input efficiently; furthermore, our system can handle long audio files with multiple speakers and can be further extended into utterance-level inference and code-switching detection, which is currently not covered by any existing SLI system.
Findings

GUMSum: Multi-Genre Data and Evaluation for English Abstractive Summarization

Yang Janet Liu and Amir Zeldes

In Findings of the Association for Computational Linguistics: ACL 2023, Jul 2023

Abs PDF Code Poster

Automatic summarization with pre-trained language models has led to impressively fluent results, but is prone to ‘hallucinations’, low performance on non-news genres, and outputs which are not exactly summaries. Targeting ACL 2023’s ‘Reality Check’ theme, we present GUMSum, a small but carefully crafted dataset of English summaries in 12 written and spoken genres for evaluation of abstractive summarization. Summaries are highly constrained, focusing on substitutive potential, factuality, and faithfulness. We present guidelines and evaluate human agreement as well as subjective judgments on recent system outputs, comparing general-domain untuned approaches, a fine-tuned one, and a prompt-based approach, to human performance. Results show that while GPT3 achieves impressive scores, it still underperforms humans, with varying quality across genres. Human judgments reveal different types of errors in supervised, prompted, and human-generated summaries, shedding light on the challenges of producing a good summary.
CODI-DISRPT

The DISRPT 2023 Shared Task on Elementary Discourse Unit Segmentation, Connective Detection, and Relation Classification

Chloé Braud, Yang Janet Liu, Eleni Metheniti, Philippe Muller, Laura Rivière, and 2 more authors

In Proceedings of the 3rd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2023), Jul 2023

Abs PDF Code Website

In 2023, the third iteration of the DISRPT Shared Task (Discourse Relation Parsing and Treebanking) was held, dedicated to the underlying units used in discourse parsing across formalisms. Following the success of the 2019and 2021 tasks on Elementary Discourse Unit Segmentation, Connective Detection, and Relation Classification, this iteration has added 10 new corpora, including 2 new languages (Thai and Italian) and 3 discourse treebanks annotated in the discourse dependency representation in addition to the previously included frameworks: RST, SDRT, and PDTB. In this paper, we review the data included in the Shared Task, which covers 26 datasets across 13 languages, survey and compare submitted systems, and report on system performance on each task for both annotated and plain-tokenized versions of the data.
LAW

GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and Linguistic Evaluation

Tatsuya Aoyama, Shabnam Behzad, Luke Gessler, Lauren Levine, Jessica Lin, and 4 more authors

In Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII), Jul 2023

Abs PDF Supp Code Website

We present GENTLE, a new mixed-genre English challenge corpus totaling 17K tokens and consisting of 8 unusual text types for out-of-domain evaluation: dictionary entries, esports commentaries, legal documents, medical notes, poetry, mathematical proofs, syllabuses, and threat letters. GENTLE is manually annotated for a variety of popular NLP tasks, including syntactic dependency parsing, entity recognition, coreference resolution, and discourse parsing. We evaluate state-of-the-art NLP systems on GENTLE and find severe degradation for at least some genres in their performance on all tasks, which indicates GENTLE’s utility as an evaluation dataset for NLP systems.
EACL

Why Can’t Discourse Parsing Generalize? A Thorough Investigation of the Impact of Data Diversity

Yang Janet Liu and Amir Zeldes

In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, May 2023

Abs PDF Code Poster Slides

Recent advances in discourse parsing performance create the impression that, as in other NLP tasks, performance for high-resource languages such as English is finally becoming reliable. In this paper we demonstrate that this is not the case, and thoroughly investigate the impact of data diversity on RST parsing stability. We show that state-of-the-art architectures trained on the standard English newswire benchmark do not generalize well, even within the news domain. Using the two largest RST corpora of English with text from multiple genres, we quantify the impact of genre diversity in training data for achieving generalization to text types unseen during training. Our results show that a heterogeneous training regime is critical for stable and generalizable models, across parser architectures. We also provide error analyses of model outputs and out-of-domain performance. To our knowledge, this study is the first to fully evaluate cross-corpus RST parsing generalizability on complete trees, examine between-genre degradation within an RST corpus, and investigate the impact of genre diversity in training data composition.

2022

AACL

GCDT: A Chinese RST Treebank for Multigenre and Multilingual Discourse Parsing

Siyao Peng, Yang Janet Liu, and Amir Zeldes

In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Nov 2022

Abs PDF Supp Code

A lack of large-scale human-annotated data has hampered the hierarchical discourse parsing of Chinese. In this paper, we present GCDT, the largest hierarchical discourse treebank for Mandarin Chinese in the framework of Rhetorical Structure Theory (RST). GCDT covers over 60K tokens across five genres of freely available text, using the same relation inventory as contemporary RST treebanks for English. We also report on this dataset’s parsing experiments, including state-of-the-art (SOTA) scores for Chinese RST parsing and RST parsing on the English GUM dataset, using cross-lingual training in Chinese and English with multilingual embeddings.
LAW

Putting Context in SNACS: A 5-Way Classification of Adpositional Pragmatic Markers

Yang Janet Liu, Jena D. Hwang, Nathan Schneider, and Vivek Srikumar

In Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022, Jun 2022

Abs PDF

The SNACS framework provides a network of semantic labels called supersenses for annotating adpositional semantics in corpora. In this work, we consider English prepositions (and prepositional phrases) that are chiefly pragmatic, contributing extra-propositional contextual information such as speaker attitudes and discourse structure. We introduce a preliminary taxonomy of pragmatic meanings to supplement the semantic SNACS supersenses, with guidelines for the annotation of coherence connectives, commentary markers, and topic and focus markers. We also examine annotation disagreements, delve into the trickiest boundary cases, and offer a discussion of future improvements.

2021

CODI-DISRPT

The DISRPT 2021 Shared Task on Elementary Discourse Unit Segmentation, Connective Detection, and Relation Classification

Amir Zeldes, Yang Janet Liu, Mikel Iruskieta, Philippe Muller, Chloé Braud, and 1 more author

In Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021), Nov 2021

Abs DOI PDF Code

In 2021, we organized the second iteration of a shared task dedicated to the underlying units used in discourse parsing across formalisms: the DISRPT Shared Task (Discourse Relation Parsing and Treebanking). Adding to the 2019 tasks on Elementary Discourse Unit Segmentation and Connective Detection, this iteration of the Shared Task included for the first time a track on discourse relation classification across three formalisms: RST, SDRT, and PDTB. In this paper we review the data included in the Shared Task, which covers nearly 3 million manually annotated tokens from 16 datasets in 11 languages, survey and compare submitted systems and report on system performance on each task for both annotated and plain-tokenized versions of the data.
CODI-DISRPT

DisCoDisCo at the DISRPT2021 Shared Task: A System for Discourse Segmentation, Classification, and Connective Detection

Luke Gessler, Shabnam Behzad, Yang Janet Liu, Siyao Peng, Yilun Zhu, and 1 more author

In Proceedings of the 2nd Shared Task on Discourse Relation Parsing and Treebanking (DISRPT 2021), Nov 2021

Abs DOI PDF Code

This paper describes our submission to the DISRPT2021 Shared Task on Discourse Unit Segmentation, Connective Detection, and Relation Classification. Our system, called DisCoDisCo, is a Transformer-based neural classifier which enhances contextualized word embeddings (CWEs) with hand-crafted features, relying on tokenwise sequence tagging for discourse segmentation and connective detection, and a feature-rich, encoder-less sentence pair classifier for relation classification. Our results for the first two tasks outperform SOTA scores from the previous 2019 shared task, and results on relation classification suggest strong performance on the new 2021 benchmark. Ablation tests show that including features beyond CWEs are helpful for both tasks, and a partial evaluation of multiple pretrained Transformer-based language models indicates that models pre-trained on the Next Sentence Prediction (NSP) task are optimal for relation classification.

2020

LREC

AMALGUM – A Free, Balanced, Multilayer English Web Corpus

Luke Gessler, Siyao Peng, Yang Janet Liu, Yilun Zhu, Shabnam Behzad, and 1 more author

In Proceedings of the Twelfth Language Resources and Evaluation Conference, May 2020

Abs PDF Code Website

We present a freely available, genre-balanced English web corpus totaling 4M tokens and featuring a large number of high-quality automatic annotation layers, including dependency trees, non-named entity annotations, coreference resolution, and discourse trees in Rhetorical Structure Theory. By tapping open online data sources the corpus is meant to offer a more sizable alternative to smaller manually created annotated data sets, while avoiding pitfalls such as imbalanced or unknown composition, licensing problems, and low-quality natural language processing. We harness knowledge from multiple annotation layers in order to achieve a “better than NLP” benchmark and evaluate the accuracy of the resulting resource.
D&D

A Neural Approach to Discourse Relation Signal Detection

Amir Zeldes and Yang Janet Liu

Dialogue and Discourse, May 2020

Abs DOI PDF

Previous data-driven work investigating the types and distributions of discourse relation signals, including discourse markers such as ’however’ or phrases such as ’as a result’ has focused on the relative frequencies of signal words within and outside text from each discourse relation. Such approaches do not allow us to quantify the signaling strength of individual instances of a signal on a scale (e.g. more or less discourse-relevant instances of ’and’), to assess the distribution of ambiguity for signals, or to identify words that hinder discourse relation identification in context (’anti-signals’ or ’distractors’). In this paper we present a data-driven approach to signal detection using a distantly supervised neural network and develop a metric, Δs (or ’delta-softmax’), to quantify signaling strength. Ranging between -1 and 1 and relying on recent advances in contextualized words embeddings, the metric represents each word’s positive or negative contribution to the identifiability of a relation in specific instances in context. Based on an English corpus annotated for discourse relations using Rhetorical Structure Theory and signal type annotations anchored to specific tokens, our analysis examines the reliability of the metric, the places where it overlaps with and differs from human judgments, and the implications for identifying features that neural models may need in order to perform better on automatic discourse relation classification.
LREC

A Corpus of Adpositional Supersenses for Mandarin Chinese

Siyao Peng, Yang Janet Liu, Yilun Zhu, Austin Blodgett, Yushi Zhao, and 1 more author

In Proceedings of the 12th Language Resources and Evaluation Conference, May 2020

Abs PDF Code

Adpositions are frequent markers of semantic relations, but they are highly ambiguous and vary significantly from language to language. Moreover, there is a dearth of annotated corpora for investigating the cross-linguistic variation of adposition semantics, or for building multilingual disambiguation systems. This paper presents a corpus in which all adpositions have been semantically annotated in Mandarin Chinese; to the best of our knowledge, this is the first Chinese corpus to be broadly annotated with adposition semantics. Our approach adapts a framework that defined a general set of supersenses according to ostensibly language-independent semantic criteria, though its development focused primarily on English prepositions (Schneider et al., 2018). We find that the supersense categories are well-suited to Chinese adpositions despite syntactic differences from English. On a Mandarin translation of The Little Prince, we achieve high inter-annotator agreement and analyze semantic correspondences of adposition tokens in bitext.

2019

DISRPT

Beyond the Wall Street Journal: Anchoring and Comparing Discourse Signals across Genres

Yang Janet Liu

In Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, Jun 2019

Abs DOI PDF

Recent research on discourse relations has found that they are cued not only by discourse markers (DMs) but also by other textual signals and that signaling information is indicative of genres. While several corpora exist with discourse relation signaling information such as the Penn Discourse Treebank (PDTB, Prasad et al. 2008) and the Rhetorical Structure Theory Signalling Corpus (RST-SC, Das and Taboada 2018), they both annotate the Wall Street Journal (WSJ) section of the Penn Treebank (PTB, Marcus et al. 1993), which is limited to the news domain. Thus, this paper adapts the signal identification and anchoring scheme (Liu and Zeldes, 2019) to three more genres, examines the distribution of signaling devices across relations and genres, and provides a taxonomy of indicative signals found in this dataset.
DISRPT

A Discourse Signal Annotation System for RST Trees

Luke Gessler, Yang Janet Liu, and Amir Zeldes

In Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, Jun 2019

Abs DOI PDF Code

This paper presents a new system for open-ended discourse relation signal annotation in the framework of Rhetorical Structure Theory (RST), implemented on top of an online tool for RST annotation. We discuss existing projects annotating textual signals of discourse relations, which have so far not allowed simultaneously structuring and annotating words signaling hierarchical discourse trees, and demonstrate the design and applications of our interface by extending existing RST annotations in the freely available GUM corpus.
DISRPT

GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection

Yue Yu, Yilun Zhu, Yang Janet Liu, Yan Liu, Siyao Peng, and 2 more authors

In Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019, Jun 2019

Abs DOI PDF

In this paper we present GumDrop, Georgetown University’s entry at the DISRPT 2019 Shared Task on automatic discourse unit segmentation and connective detection. Our approach relies on model stacking, creating a heterogeneous ensemble of classifiers, which feed into a metalearner for each final task. The system encompasses three trainable component stacks: one for sentence splitting, one for discourse unit segmentation and one for connective detection. The flexibility of each ensemble allows the system to generalize well to datasets of different sizes and with varying levels of homogeneity.
MS THESIS

Signaling of Discourse Relations: Anchoring Discourse Signals across Genres

Yang Janet Liu

Georgetown University, Jun 2019

Abs PDF

Discourse Relations, also known as coherence or rhetorical relations, characterize the semantic or pragmatic relationships between clauses or sentences in discourse. Such relations are established in order to facilitate effective communication. In addition to the inventory of relations, previous research has also investigated how discourse relations are established or signaled. Discourse markers (DMs) are considered to be the most typical signals in discourse; however, focusing merely on DMs is inadequate as they can only account for a small number of relations in discourse. Thus, researchers have been exploring textual signals beyond DMs such as the Penn Discourse Treebank 2.0 (PDTB, Prasad et al. 2008) and the Rhetorical Structure Theory Signalling Corpus (RST-SC, Das and Taboada 2018). Despite their different theoretical groundings and approaches to relation signaling, both corpora annotated the Wall Street Journal (WSJ) section of the Penn Treebank (PTB, Marcus et al. 1993), i.e. the news articles. Nevertheless, previous work has suggested that signaling information is indicative of genres (e.g. Taboada and Lavid 2003; Zeldes 2018). Therefore, this project aims to anchor signaling devices on a more diverse corpus to demonstrate the inadequacy of signaling by DMs only, the abundance of open-class signals, and more importantly, the distribution of signaling devices across genres.
SCiL

Discourse Relations and Signaling Information: Anchoring Discourse Signals in RST-DT

Yang Janet Liu and Amir Zeldes

In Proceedings of the Society for Computation in Linguistics, Jun 2019

Abs DOI PDF

Research on discourse relations between clauses, such as cause or contrast, has studied how relations are signaled in discourse. Several corpora include discourse relation annotations: the Penn Discourse Treebank (Prasad et al., 2008) annotates a subset of relations marked by explicit connectives (e.g. ‘however’) or understood implicit ones, while the RST-Signalling Corpus (Taboada & Das 2013) annotates the presence of signals exhaustively, but provides no information about the location of signaling devices. We present an annotation effort to anchor discourse signals at all levels, bridging the gap between these two frameworks, and support feature engineering for automatic discourse parsing.
SCiL

Adpositional Supersenses for Mandarin Chinese

Yilun Zhu, Yang Janet Liu, Siyao Peng, Austin Blodgett, Yushi Zhao, and 1 more author

In Proceedings of the Society for Computation in Linguistics, Jun 2019

Abs DOI PDF

This study adapts Semantic Network of Adposition and Case Supersenses (SNACS) annotation to Mandarin Chinese and demonstrates that the same supersense categories are appropriate for Chinese adposition semantics. We annotated 20 chapters of The Little Prince, with high interannotator agreement. The parallel corpus substantiates the applicability of construal analysis in Chinese and gives insight into the differences in construals between adpositions in two languages. The corpus can further support automatic disambiguation of adpositions in Chinese, and the common inventory of supersenses between the two languages can potentially serve cross-linguistic tasks such as machine translation.

2017

LSA

Scalar Implicature in Chitonga-Speaking Children

Jodi Reich, Kelly Nedwick, Teodora Niculae-Caxi, Yang Janet Liu, and Elena L Grigorenko

In Proceedings of the Linguistic Society of America, Jun 2017

Abs DOI PDF

Research on the acquisition of scalar implicature (SI) has provided evidence that young children interpret SI differently from adults. However, results have varied, and there is now mounting evidence that around six years of age, children are able to derive the pragmatic inferences associated with SI (Foppolo, Guasti, and Chierchia, 2012). Variability in results across studies could be due to factors such as data collection methods and language-specific differences. In order to add to the growing body of literature in a meaningful way, this research investigated the interpretation of sentences that include SI by Chitonga-speaking children (7-15 years old) in rural Southern Province, Zambia, who were notably beyond the key age of six. The results of this study provide valuable insight into the interpretation of SI in a Bantu language and suggest that the acquisition of pragmatic felicity with words on a scale follows the order of acquisition identified in previous research, but may emerge at a later age in this linguistic context.