Elsevier

Brain and Language

Volume 223, December 2021, 105029
Brain and Language

Neural responses in novice learners’ perceptual learning and generalization of lexical tones: The effect of training variability

https://doi.org/10.1016/j.bandl.2021.105029Get rights and content

Highlights

  • Training variability modulated learners’ neural responses to non-native tones.

  • The high-variability group had a more negative mismatch negativity after training.

  • The group effect was found for a tone pair which was acoustically salient.

  • The training groups showed a decreased late negativity after training.

Abstract

The acoustics of lexical tones are highly variable across talkers, and require second-language (L2) learners’ flexibility in accommodating talker-specific tonal variations for successful learning. This study investigated how tone training with high vs. low talker-variability modulated novice learners’ neural responses to non-native tones. A passive oddball paradigm tested Mandarin-speaking participants’ neural responses to Cantonese low–high and low-mid tonal contrasts in the pretest and posttest. Participants were trained using a tone identification task with feedback, either with high or low talker-variability. The results of mismatch negativity (MMN) showed no group difference in the pretest whereas the high-variability group demonstrated greater neural sensitivity to the low–high tonal contrast produced by a novel talker and a trained talker in the posttest. The finding provides (tentative) novel evidence that training variability may benefit perceptual learning of the relatively easy tone pair and facilitate the formation of talker-independent representations of non-native tones by novice learners.

Introduction

The speech signal contains great variability, which poses considerable difficulty for second language (L2) leaners’ perceptual learning of speech sound categories. For instance, multiple sources of variations have been noted in the acoustic signal of lexical tones, including talker, gender and tonal context (e.g., Zhang & Chen, 2016). Such variations place a demand on second-language (L2) learners to extract the abstract representations from tonal exemplars across different talkers to accommodate talker-specific tonal variations (Zhang, Peng, Li, Minett, & Wang, 2018), in order to distinguish different tone categories successfully. Recent training studies have debated the role of training exposure to talker variability (i.e., training variability) in perceptual learning of speech sounds at a behavioral level (Fuhrmeister and Myers, 2020, Perrachione et al., 2011). An open question is whether, and if so how, training variability influences perceptual learning of non-native tones and generalization to new tokens produced by novel talkers (i.e., talker generalization) at a neural level. In this paper, we will investigate the effect of training variability on Mandarin-speaking participants’ neural responses, as indexed by the mismatch negativity (MMN) and late discrimination negativity (LDN), in their perceptual learning of Cantonese level tones produced by trained talkers and a novel talker.

The variability of training materials is assumed to benefit learners and has been tested in perceptual learning of segments (Bradlow et al., 1997, Bradlow et al., 1999, Lively et al., 1994) and prosodic categories such as lexical tones (Wayland and Guion, 2004, Wiener et al., 2020). However, the findings are mixed in terms of its beneficial impact. Studies have found the beneficial effects of exposing learners to variability, for example, of the /ɹ/-/l/ contrast, during training (Bradlow et al., 1997, Lively et al., 1994). Training variability is often introduced by presenting training tokens of /ɹ/ and /l/ produced by multiple talkers or occurring in different phonological contexts (e.g., vowels following /ɹ/ or /l/) (Bradlow et al., 1997, Bradlow et al., 1999). In the case of perceptual learning of lexical tones, Wang and her colleagues showed that perceptual training using Mandarin tone stimuli produced by multiple talkers facilitated English-speaking participants’ perception of Mandarin tones produced by the trained talkers, generalization to tones produced by novel talkers, as well as long-term retention of non-native tones (e.g., six months after training) (Wang, Spence, Jongman, & Sereno, 1999). The finding suggests that training variability might have facilitated learners’ focus on perceptual cues of lexical tones (i.e., pitch height: higher or lower tones; pitch contour: level, falling, or rising tones), which are generalizable across talkers and facilitated tone learning over a long time period. Given the high variability of lexical tones across (and within) talkers (Peng, 2006), the tone variability induced by different talkers (i.e., talker variability) during training seems to be critical for learners’ abstraction of tone representations.

On the other hand, many tone training studies used tone stimuli produced by multiple talkers but challenged whether the positive effect of high-variability training was universal relative to low-variability training (Dong et al., 2019, Sadakata and McQueen, 2014). First, the (beneficial) effect of training variability is influenced by how variability is implemented during training (Perrachione et al., 2011). Perrachione et al. (2011) trained English-speaking participants to use Mandarin tones in identifying pseudowords and manipulated the degree of trial-to-trial variability in different types of high-variability training. The results showed that talker-blocked training (in which stimuli produced by a single talker were presented in one block and stimuli from multiple talkers were introduced across blocks) elicited a faster learning rate as well as a better learning outcome than the talker-mixed training (in which stimuli produced by multiple talkers were presented within a block).

Another factor that might account for the mixed findings in the literature is the target learners who were trained or tested (C. B. Chang & Bowles, 2015). Many of the earlier studies that found the beneficial effect of training variability had tested learners who had prior experience with the L2 and might be more capable of dealing with training variability of L2 sounds (Bradlow et al., 1997, Lively et al., 1994, Wang et al., 1999). However, studies that trained novice learners (i.e., naïve listeners) on non-native tones often did not reveal the beneficial effect of training variability (Dong et al., 2019, Perrachione et al., 2011). A different but related issue in the current tone learning studies is that most of the aforementioned studies investigated perceptual learning of Mandarin tones with pitch contour contrasts by novice learners who speak non-tonal languages (e.g., English and Dutch) without prior exposure to lexical tones (Dong et al., 2019, Perrachione et al., 2011, Sadakata and McQueen, 2014). Different from contour tones in Mandarin, Cantonese has multiple level tones, including T1 (high-level tone), T3 (mid-level tone) and T6 (low-level tone), which are primarily distinguished by fine-grained pitch height differences. Level tones with less dynamic contour changes are more susceptible to the influence of talker variability than contour tones (Peng, 2006), and thus constitute an important investigation case to further understand the effect of training variability on level-tone learning through talker generalization.

The third factor which may account for the mixed findings of high-variability training is the time (of day) when participants get trained (Fenn et al., 2013, Fuhrmeister and Myers, 2020). While most previous training studies did not control at what time the participants were trained, recent studies suggested that evening training facilitated better retention of newly-learned sound contrasts by promoting generalization across talkers than morning training did (Earle and Myers, 2015, Xie et al., 2018). For instance, Earle and Myers (2015) showed that while English listeners trained in the evening improved significantly in identifying the novel Hindi sound stimuli produced by an untrained talker (but not those produced by a trained talker), those trained in the morning did not show such a pattern. In the case of tone learning, Qin and Zhang, 2019 showed that Mandarin listeners trained in the evening showed an improved trend in identifying the level tones produced by both the trained and untrained talkers. Again, those trained in the morning did not show such a pattern. In short, previous studies have found that high-variability speech training, when conducted in the evening, has the potential to benefit perceptual learning of lexical tones through talker generalization. We will then examine perceptual learning of (Cantonese) level tones, produced by trained and untrained talkers.

While training variability has been tested in many behavioral studies on lexical tones, it remains unclear how training variability will affect learners’ neural responses to non-native tonal contrasts after training. It is important to note that participants would need to discriminate or identify tone stimuli consciously in behavioral studies, so their behavioral responses may have been affected by factors of attention, working memory and others. In contrast, event-related potentials (ERPs) are a good method to study the pre-attentive (or unconscious) processing of lexical tones when the auditory stimuli are presented to participants without their focal attention. Testing pre-attentive processing of lexical tones using ERPs is more informative than only recording behavioral responses (e.g., discrimination accuracy) because perceptual changes may only occur at the unconscious level, for instance, in a training study (Lu, Wayland, & Kaan, 2015). For instance, the MMN, a frontal negative ERP component occurring about 100–300 ms after stimulus onset, has been used as a tool to assess the pre-attentive ability to distinguish lexical tones by native and non-native listeners (see Näätänen, 2001 for an overview). The MMN is elicited by infrequent stimuli that deviate from frequently presented (standard) stimuli in pitch or other phonetic cues (e.g., duration, voice onset time), and a larger MMN amplitude and/or an earlier MMN peak indicate a greater sensitivity to these cues (Naatanen, 2001, Tuninetti et al., 2017). The changes of MMN amplitude and/or peak latency can be observed even before changes in behavioral discrimination performance (Tremblay, Kraus, & McGee, 1998). Thus, the pre-attentive response, MMN, provides a sensitive tool to test the neural mechanisms underlying native and non-native tone discrimination (Chandrasekaran et al., 2007a, Kaan et al., 2007).

A few studies have employed the MMN to examine the processing of Mandarin tones by native Mandarin-learning children (Lee et al., 2012) and adult learners who speak non-tonal languages, for example, English (Liu et al., 2018, Yu et al., 2019). However, fewer studies have used the MMN to investigate the effect of laboratory training on neural responses to non-native tones by adult learners who speak tonal languages. Kaan and her colleagues used a passive oddball paradigm to investigate the effects of L1 backgrounds (i.e., Mandarin versus English) and perceptual identification training (i.e., before and after training) on the pre-attentive processing of Thai tones as indexed by the MMN (Kaan et al., 2007, Kaan et al., 2008). The ERP results showed that the Mandarin and English-speaking participants achieved different training outcomes, which was attributed to the effect of L1. After training, the English listeners showed an increased MMN (150–300 ms). Interestingly, the MMN increase was not observed after training for the Mandarin listeners, who only showed a decreased late negativity (500–700 ms). The group difference was further modulated by tonal contrasts, in that no group difference was found with respect to the Thai low-mid tonal contrast (i.e., a tone pair that was perceptually trained vs. a high-low tonal tone pair that was not), which was attributed to a large MMN amplitude in both groups before training. To our knowledge, Lu et al., is the only ERP study which used MMN (and late negativity in a time window of 500–800 ms) to examine the effect of different training methods (i.e., perception-only training versus perception-plus-production training) on English listeners’ pre-attentive processing of non-native (Thai) tones (Lu et al., 2015). The behavioral results showed that English-speaking participants in both training groups were able to generalize from the trained stimuli to the untrained stimuli in novel phonetic contexts (i.e., syllables) regarding their discrimination performance. However, the MMN results did not yield a difference after training, suggesting a similar effect of the perception-only and the perception-plus-production training on the pre-attentive processing of non-native tones.

In addition to the MMN, the late negativity (mentioned above in Kaan and her colleagues’ research), likely to be the late discriminative negativity (LDN), is a negative wave which could follow an MMN and often occurs around 500 ms after the onset of auditory stimuli (Cheour, Korpilahti, Martynova, & Lang, 2001). Although the cognitive function of the late negativity remains debated (e.g., whether the MMN and late negativity have the same underlying mechanism), the late negativity was often reported to reflect additional processing of the stimuli, for instance, when the salient features of the stimuli are hard to detect (Bishop, Hardiman, & Barry, 2011) or when the stimuli are newly encountered (Zachau et al., 2005). In the studies on lexical tones, the late negativity has been suggested to be associated with the transfer of the newly-encountered tone regularity into long-term memory, that is, a higher level of tone abstraction (Cheour et al., 2001). It was also suggested to reflect the reorientation of attention after involuntary attention to deviant tone stimuli (Lu et al., 2015). Importantly, the late negativity was reported to become smaller in amplitude after training in several tone training studies, potentially suggesting an effect of perceptual learning on more efficient neural transfer or attentional reorientation to lexical tone changes (Kaan et al., 2007, Kaan et al., 2008). Since the decreased negativity was associated with improved discrimination, we followed the tentative interpretation in (Chen, Peter, Wijnen, Schnack, & Burnham, 2018) that the late negativity is a discriminative neural response, that is, the LDN. While the previous ERP research has suggested the effect of L1 on the MMN and LDN as well as the efficacy of perception-only training on the processing of both trained and untrained tone stimuli, little ERP research (to our knowledge) to date has investigated the effect of training variability on the neural processing of non-native tones, especially in novice learners with tonal L1 backgrounds. A MMN study, which are sensitive in revealing unconscious changes after training, may be well suited for informing the debate on the effect of training variability and deepening our understanding of the changes of neural sensitivity to non-native tones.

While many studies have employed the MMN (and LDN) to test neural processing of contour tones by novice learners with non-tonal L1 backgrounds (Chandrasekaran et al., 2007b), it is less clear how training variability modulates the neural responses to level tones by novice learners with tonal L1 backgrounds, and their neural responses to tones produced by trained and novel talkers (i.e., talker generalization) after training. Therefore, the present study investigated the effect of training variability on neural responses by focusing on Mandarin-speaking novice learners’ perceptual learning of Cantonese level-level tonal contrasts. On the one hand, with a tonal L1 background, Mandarin speakers are familiar with the use of pitch patterns and their variability in the lexical domain, meaning that they may have some competence in handling training variability (Wayland and Guion, 2004, Zhang et al., 2018). On the other hand, Mandarin speakers rely more on pitch contour cues than pitch height cues in pitch perception as a result of the influence of their contour tone system (Gandour, 1983). The tone system places a demand on Mandarin speakers to learn to differentiate fine-grained variations in a less-familiar dimension (i.e., pitch height) and generalize the learned contrasts to new talkers (Qin & Jongman, 2016). These aspects of Mandarin-speaking novice learners make them a valuable case in studying the effect of training variability on tone learning.

The aim of the present study is to examine whether, and if so how, training variability influences Mandarin-speaking novice learners’ neural processing of Cantonese level tones by testing pre-attentive (MMN) and late, potentially attentive neural responses (LDN). Level-tone stimuli produced by trained and novel (untrained) talkers are used to assess talker generalization. If there is a beneficial effect of high-variability training on talker generalization in Mandarin-speaking novice learners, we expect learners receiving high-variability training to show a more pronounced MMN (e.g., a larger amplitude) and/or a more decreased LDN (e.g., a smaller amplitude) than learners receiving low-variability training for tone stimuli, especially for stimuli produced by the untrained talker.

Another aim of the current study is to investigate which tone pairs (i.e., tonal contrasts) are more likely to yield the effect of training variability. Some tone pairs are more easily confused than others because of their acoustic salience. For instance, Chandrasekaran et al. (2007b) tested the effect of L1 on the pre-attentive processing of different Mandarin tone pairs depending on the acoustic salience, that is, an easy tone pair (T1, a high level tone vs. T2, a high rising tone) with larger acoustic differences and a difficult tone pair (T2, a high rising tone vs. T3, a falling-rising tone) with smaller acoustic differences. The results showed that the Mandarin listeners demonstrated a larger MMN amplitude than the English listeners to the easy tone pair which was acoustically salient (Chandrasekaran et al., 2007a). Since training may also modulate the neural processing of tone pairs differently depending on their acoustic salience (Kaan et al., 2007, Wang et al., 1999), two level-tone pairs with large acoustic differences (i.e., an easy tone pair) and small acoustic differences (i.e., a difficult tone pair) are included in the current study. Given the reported perceptual difficulty by non-native Mandarin listeners in differentiating Cantonese level tones (Qin & Jongman, 2016), the effect of training variability in terms of MMN is predicted to show different results for the tone pairs with an effect more likely to be found for the easy tone pair which is acoustically salient. In a nutshell, we predict an interaction of training groups with other factors (e.g., tonal contrasts) on the MMN, instead of a simple effect of training groups, given the mixed findings in the literature (Chandrasekaran et al., 2007b, Kaan et al., 2007). A decreased LDN, which might also interact with tonal contrasts, is predicted after training based on the previous findings (Kaan et al., 2007, Kaan et al., 2008).

To test these predictions, we adopted a pretest-training-posttest design to compare the neural responses in novice learners with tonal L1 backgrounds. Two training groups of Mandarin-speaking participants received Cantonese-tone identification training in either a high-variability or a low-variability condition. The participants were all trained in the evening using a talker-blocked fashion to achieve the optimal learning outcome. Stimuli of easy and difficult tone pairs, produced by trained and novel talkers, were used to assess the effect of training variability on early (pre-attentive, MMN) and late (possibly attentive, LDN) neural responses.

Section snippets

Participants

Forty Mandarin-speaking participants were recruited for the experiment in Hong Kong. They were all native Mandarin speakers. And they were novice learners with minimal exposure to Cantonese (length of residence in Hong Kong shorter than thirteen months; no classroom learning of Cantonese). All the participants identified Beijing Mandarin (i.e., Putonghua) to be their L1, alone or together with another Mandarin Chinese variety. None of them knew any Southern Chinese dialect/language (e.g.,

Results

Two sets of analyses conducted on the tone discrimination performance of the behavioral pretest and the tone identification performance of the training, respectively, confirmed that the two groups did not differ in their ability to discriminate tones before training and that they both improved during training. These analyses are reported in the Supplementary Material.

Discussion

The present study investigated whether, and if so how, training variability influences Mandarin-speaking novice learners’ neural processing of Cantonese level tones. Two tone pairs, T61 and T63, were included to further investigate which tone pair would be more likely to yield the effect of training variability. The Mandarin-speaking participants were trained using either a high-variability or a low-variability training method. Their MMN and LDN were tested in a passive oddball paradigm for

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgements

This research was supported by a Language Learning Early Career Research Grant and Start-up Fund at the Division of Humanities, the Hong Kong University of Science and Technology awarded to Zhen Qin. The authors would like to thank Weijie Tan for her help in the early stages of the project.

References (56)

  • S. Lu et al.

    Effects of production training and perception training on lexical tone perception - A behavioral and ERP study

    Brain Research

    (2015)
  • A. Tuninetti et al.

    When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech

    Brain and Language

    (2017)
  • D.V.M. Bishop et al.

    Is auditory discrimination mature by middle childhood? A study using time-frequency analysis of mismatch responses from 7 years to adulthood

    Developmental Science

    (2011)
  • Boersma, P., & Weenink, D. (2018). Praat: doing phonetics by computer [Computer program]. Version 6.0.43. Retrieved 8...
  • A.R. Bradlow et al.

    Training Japanese listeners to identify english /r/and /1/: Long-term retention of learning in perception and production

    Perception and Psychophysics

    (1999)
  • A.R. Bradlow et al.

    Training Japanese listeners to identify English / r / and / l /: IV. Some effects of perceptual learning on speech production

    The Journal of the Acoustical Society of America

    (1997)
  • M. Brysbaert

    Power considerations in bilingualism research: Time to step up our game

    Bilingualism

    (2020)
  • K.S. Button et al.

    Power failure: Why small sample size undermines the reliability of neuroscience

    Nature Reviews Neuroscience

    (2013)
  • B. Chandrasekaran et al.

    Experience-dependent neural plasticity is sensitive to shape of pitch contours

    NeuroReport

    (2007)
  • C.B. Chang et al.

    Context effects on second-language learning of tonal contrasts

    The Journal of the Acoustical Society of America

    (2015)
  • Y.-H. Chang et al.

    Effects of linguistic experience on the perception of high-variability non-native tones

    The Journal of the Acoustical Society of America

    (2017)
  • Y.R. Chao

    A grammar of spoken Chinese = Zhongguo hua de wen fa

    Zhongguo hua de wen fa

    (1968)
  • A. Chen et al.

    Cross-domain correlation in pitch perception, the influence of native language

    Language, Cognition and Neuroscience

    (2016)
  • M. Cheour et al.

    Mismatch negativity and late discriminative negativity in investigating speech perception and learning in children and infants

    Audiology and Neuro-Otology

    (2001)
  • A. Cui et al.

    The effects of musicality and language background on cue integration in pitch perception

    The Journal of the Acoustical Society of America

    (2019)
  • H. Dong et al.

    The effects of high versus low talker variability and individual aptitude on phonetic training of Mandarin lexical tones

    PeerJ

    (2019)
  • F.S. Earle et al.

    Overnight consolidation promotes generalization across talkers in the identification of nonnative speech sounds

    The Journal of the Acoustical Society of America

    (2015)
  • P. Fuhrmeister et al.

    Desirable and undesirable difficulties: Influences of variability, training schedule, and aptitude on nonnative phonetic learning

    Attention, Perception, and Psychophysics

    (2020)
  • Cited by (5)

    View full text