Interpreting Character Embeddings With Perceptual Representations: The Case of Shape, Sound, and Color
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Standard
Interpreting Character Embeddings With Perceptual Representations: The Case of Shape, Sound, and Color. / Boldsen, Sidsel; Aguirrezabal Zabaleta, Manex; Hollenstein, Nora.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Bind 1 Dublin : Association for Computational Linguistics, 2022. s. 6819–6836.Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Interpreting Character Embeddings With Perceptual Representations: The Case of Shape, Sound, and Color
AU - Boldsen, Sidsel
AU - Aguirrezabal Zabaleta, Manex
AU - Hollenstein, Nora
PY - 2022
Y1 - 2022
N2 - Character-level information is included in many NLP models, but evaluating the information encoded in character representations is an open issue. We leverage perceptual representations in the form of shape, sound, and color embeddings and perform a representational similarity analysis to evaluate their correlation with textual representations in five languages. This cross-lingual analysis shows that textual character representations correlate strongly with sound representations for languages using an alphabetic script, while shape correlates with featural scripts.We further develop a set of probing classifiers to intrinsically evaluate what phonological information is encoded in character embeddings. Our results suggest that information on features such as voicing are embedded in both LSTM and transformer-based representations.
AB - Character-level information is included in many NLP models, but evaluating the information encoded in character representations is an open issue. We leverage perceptual representations in the form of shape, sound, and color embeddings and perform a representational similarity analysis to evaluate their correlation with textual representations in five languages. This cross-lingual analysis shows that textual character representations correlate strongly with sound representations for languages using an alphabetic script, while shape correlates with featural scripts.We further develop a set of probing classifiers to intrinsically evaluate what phonological information is encoded in character embeddings. Our results suggest that information on features such as voicing are embedded in both LSTM and transformer-based representations.
U2 - 10.18653/v1/2022.acl-long.470
DO - 10.18653/v1/2022.acl-long.470
M3 - Article in proceedings
VL - 1
SP - 6819
EP - 6836
BT - Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics
PB - Association for Computational Linguistics
CY - Dublin
ER -
ID: 306304258