Interpreting Character Embeddings With Perceptual Representations: The Case of Shape, Sound, and Color

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Interpreting Character Embeddings With Perceptual Representations: The Case of Shape, Sound, and Color. / Boldsen, Sidsel; Aguirrezabal Zabaleta, Manex; Hollenstein, Nora.

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Vol. 1 Dublin : Association for Computational Linguistics, 2022. p. 6819–6836.

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Boldsen, S, Aguirrezabal Zabaleta, M & Hollenstein, N 2022, Interpreting Character Embeddings With Perceptual Representations: The Case of Shape, Sound, and Color. in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. vol. 1, Association for Computational Linguistics, Dublin, pp. 6819–6836. https://doi.org/10.18653/v1/2022.acl-long.470

APA

Boldsen, S., Aguirrezabal Zabaleta, M., & Hollenstein, N. (2022). Interpreting Character Embeddings With Perceptual Representations: The Case of Shape, Sound, and Color. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Vol. 1, pp. 6819–6836). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.acl-long.470

Vancouver

Boldsen S, Aguirrezabal Zabaleta M, Hollenstein N. Interpreting Character Embeddings With Perceptual Representations: The Case of Shape, Sound, and Color. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Vol. 1. Dublin: Association for Computational Linguistics. 2022. p. 6819–6836 https://doi.org/10.18653/v1/2022.acl-long.470

Author

Boldsen, Sidsel ; Aguirrezabal Zabaleta, Manex ; Hollenstein, Nora. / Interpreting Character Embeddings With Perceptual Representations: The Case of Shape, Sound, and Color. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Vol. 1 Dublin : Association for Computational Linguistics, 2022. pp. 6819–6836

Bibtex

@inproceedings{204a1cdba329442990b706f49c3d5ea2,
title = "Interpreting Character Embeddings With Perceptual Representations: The Case of Shape, Sound, and Color",
abstract = "Character-level information is included in many NLP models, but evaluating the information encoded in character representations is an open issue. We leverage perceptual representations in the form of shape, sound, and color embeddings and perform a representational similarity analysis to evaluate their correlation with textual representations in five languages. This cross-lingual analysis shows that textual character representations correlate strongly with sound representations for languages using an alphabetic script, while shape correlates with featural scripts.We further develop a set of probing classifiers to intrinsically evaluate what phonological information is encoded in character embeddings. Our results suggest that information on features such as voicing are embedded in both LSTM and transformer-based representations.",
author = "Sidsel Boldsen and {Aguirrezabal Zabaleta}, Manex and Nora Hollenstein",
year = "2022",
doi = "10.18653/v1/2022.acl-long.470",
language = "English",
volume = "1",
pages = "6819–6836",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics",
publisher = "Association for Computational Linguistics",

}

RIS

TY - GEN

T1 - Interpreting Character Embeddings With Perceptual Representations: The Case of Shape, Sound, and Color

AU - Boldsen, Sidsel

AU - Aguirrezabal Zabaleta, Manex

AU - Hollenstein, Nora

PY - 2022

Y1 - 2022

N2 - Character-level information is included in many NLP models, but evaluating the information encoded in character representations is an open issue. We leverage perceptual representations in the form of shape, sound, and color embeddings and perform a representational similarity analysis to evaluate their correlation with textual representations in five languages. This cross-lingual analysis shows that textual character representations correlate strongly with sound representations for languages using an alphabetic script, while shape correlates with featural scripts.We further develop a set of probing classifiers to intrinsically evaluate what phonological information is encoded in character embeddings. Our results suggest that information on features such as voicing are embedded in both LSTM and transformer-based representations.

AB - Character-level information is included in many NLP models, but evaluating the information encoded in character representations is an open issue. We leverage perceptual representations in the form of shape, sound, and color embeddings and perform a representational similarity analysis to evaluate their correlation with textual representations in five languages. This cross-lingual analysis shows that textual character representations correlate strongly with sound representations for languages using an alphabetic script, while shape correlates with featural scripts.We further develop a set of probing classifiers to intrinsically evaluate what phonological information is encoded in character embeddings. Our results suggest that information on features such as voicing are embedded in both LSTM and transformer-based representations.

U2 - 10.18653/v1/2022.acl-long.470

DO - 10.18653/v1/2022.acl-long.470

M3 - Article in proceedings

VL - 1

SP - 6819

EP - 6836

BT - Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics

PB - Association for Computational Linguistics

CY - Dublin

ER -

ID: 306304258