Simon Šuster
I'm an Natural Language Processing (NLP) Scientist at
Textgain, currently based in Slovenia. In my previous position, I worked as a research fellow for the
ARC Training Centre in Cognitive Computing for Medical Technologies at the University of Melbourne, together with
Karin Verspoor and
Tim Baldwin.
Prior to that, I was a postdoc at the
Computational Linguistics & Psycholinguistics Research Center (CLiPS) of the University of Antwerp, headed by
Walter Daelemans. I completed my PhD at the University of Groningen, advised by
Gertjan van Noord and
Ivan Titov. In a distant past, I was a master student in the
LCT program and obtained my university degree in
translation studies in Slovenia.
My professional interests include real-world NLP applications in specialised domains, such as medicine, and in multilingual contexts. I'm also particularly drawn to exploring the intersection of NLP/AI with the humanities and ethics. I co-advised (to completion) two MSc students (Siyang Wang and Fan Ye, both medical NLP at the University of Melbourne) and two PhD students (University of Antwerp):
Madhumita Sushil on interpretability and document representation learning, and
Pieter Fivez on lexical normalization and modeling of lexical variability. I'm currently supervising a student at the University of Ljubljana on detecting misleading reporting in scientific literature using LLMs.
Publications
- 2024
- Zero- and Few-Shot Prompting of Generative Large Language Models Provides Weak Assessment of Risk of Bias in Clinical Trials.
Simon Šuster, Timothy Baldwin, Karin Verspoor. Research Synthesis Methods, 2024.
- 2023
- Uncertainty Estimation for Debiased Models: Does Fairness Hurt Reliability?
Gleb Kuzmin, Artem Vazhentsev, Artem Shelmanov, Xudong Han, Simon Suster, Maxim Panov, Alexander Panchenko, Timothy Baldwin. IJCNLP-AACL, 2023.
- Promoting Fairness in Classification of Quality of Medical Evidence
Simon Šuster, Timothy Baldwin, Karin Verspoor. BioNLP, ACL, 2023.
- Analysis of predictive performance and reliability of classifiers for quality assessment of medical evidence revealed important variation by medical area
Simon Šuster, Timothy Baldwin, Karin Verspoor. Journal of Clinical Epidemiology, 2023.
- Automating Quality Assessment of Medical Evidence in Systematic Reviews: Model Development and Validation Study
Simon Šuster, Timothy Baldwin, Jey Han Lau, Antonio Jimeno Yepes, David Martinez Iraola, Yulia Otmakhova, Karin Verspoor. Journal of Medical Internet Research, 25: e35568, 2023.
- 2022
- Predicting Publication of Clinical Trials Using Structured and Unstructured Data: Model Development and Validation Study
Siyang Wang*, Simon Šuster*, Timothy Baldwin and Karin Verspoor (*equal contribution). Journal of Medical Internet Research, 24 (12): e38859, 2022.
- 2021
- Mapping probability word problems to executable representations.
Simon Šuster, Pieter Fivez, Pietro Totis, Angelika Kimmig, Jesse Davis, Luc de Raedt and Walter Daelemans. EMNLP (long paper), 2021.
- Impact of detecting clinical trial elements in exploration of COVID-19 literature
Simon Šuster, Karin Verspoor, Timothy Baldwin, Jey Han Lau, Antonio Jimeno Yepes, David Martinez, Yulia Otmakhova. HealthNLP Workshop, 2021.
- Are we there yet? Exploring clinical domain knowledge of BERT models
Madhumita Sushil, Simon Šuster and Walter Daelemans. BioNLP, NAACL, 2021.
- Contextual explanation rules for neural clinical classifiers
Madhumita Sushil, Simon Šuster and Walter Daelemans. BioNLP, NAACL, 2021.
- Scalable Few-Shot Learning of Robust Biomedical Name Representations
Pieter Fivez, Simon Šuster and Walter Daelemans. BioNLP, NAACL, 2021.
- Integrating Higher-Level Semantics into Robust Biomedical Name Representations
Pieter Fivez, Simon Šuster and Walter Daelemans. Workshop on Health Text Mining and Information Analysis (LOUHI), EACL, 2021.
- Conceptual Grounding Constraints for Truly Robust Biomedical Name Representations
Pieter Fivez, Simon Šuster and Walter Daelemans. EACL, 2021.
- Brief description of COVID-SEE: The Scientific Evidence Explorer for COVID-19 Related Research
Karin Verspoor, Simon Šuster, Yulia Otmakhova, Shevon Mendis, Zenan Zhai, Biaoyan Fang, Jey Han Lau, Timothy Baldwin, Antonio Jimeno Yepes, David Martinez. ECIR, 2021.
- 2020
- Improved Topic Representations of Medical Documents to Assist COVID-19 Literature Exploration
Yulia Otmakhova, Karin Verspoor, Timothy Baldwin, and Simon Šuster. NLP COVID-19 Workshop at EMNLP, 2020.
- COVID-SEE: Scientific Evidence Explorer for COVID-19 Related Research
Karin Verspoor, Simon Šuster, Yulia Otmakhova, Shevon Mendis, Zenan Zhai, Biaoyan Fang, Jey Han Lau, Timothy Baldwin, Antonio Jimeno Yepes, David Martinez. arXiv preprint arXiv:2008.07880, 2020.
- Distilling neural networks into skipgram-level decision lists
Madhumita Sushil, Simon Šuster and Walter Daelemans. arXiv preprint arXiv:2005.07111, 2020.
- 2019
- Why can't memory networks read effectively? [code ]
Simon Šuster, Madhumita Sushil and Walter Daelemans. arXiv preprint arXiv:1910.07350, 2019.
- Unsupervised Concept Extraction from Clinical Text through Semantic Composition. [code]
Stéphan Tulkens, Simon Šuster and Walter Daelemans. Journal of Biomedical Informatics, 2019.
- 2018
- Revisiting neural relation classification in clinical notes with external information. [bibtex · poster · code]
Simon Šuster, Madhumita Sushil and Walter Daelemans. Workshop on Health Text Mining and Information Analysis (LOUHI), EMNLP , 2018.
- Rule induction for global explanation of trained models. [bibtex · poster · code]
Madhumita Sushil, Simon Šuster and Walter Daelemans. Analyzing and interpreting neural networks for NLP (BlackBoxNLP), workshop at EMNLP , 2018.
- Patient representation learning and interpretable evaluation using clinical notes. [bibtex]
Madhumita Sushil, Simon Šuster, Kim Luyckx and Walter Daelemans. Journal of Biomedical Informatics, 2018.
- CliCR: A Dataset of Clinical Case Reports for Machine Reading Comprehension. [bibtex · code]
Simon Šuster and Walter Daelemans. NAACL (long paper), 2018.
- 2017
- Unsupervised patient representations from clinical notes with interpretable classification decisions. [bibtex · poster ]
Madhumita Sushil, Simon Šuster, Kim Luyckx and Walter Daelemans. NIPS Machine Learning for Health Workshop, 2017.
- Unsupervised Context-Sensitive Spelling Correction of English and Dutch Clinical Free-Text with Word and Character N-Gram Embeddings. [bibtex · code]
Pieter Fivez, Simon Šuster and Walter Daelemans. CLIN Journal, 2017.
- Unsupervised Context-Sensitive Spelling Correction of Clinical Free-Text with Word and Character N-Gram Embedding. [bibtex · code]
Pieter Fivez, Simon Šuster and Walter Daelemans. BioNLP, 2017.
- A Short Review of Ethical Challenges in Clinical Natural Language Processing. [poster · bibtex]
Simon Šuster, Stéphan Tulkens and Walter Daelemans. First Workshop on Ethics in NLP, EACL, 2017.
- 2016
- Using Distributed Representations to Disambiguate Biomedical and Clinical Concepts. [bibtex · code]
Stéphan Tulkens, Simon Šuster and Walter Daelemans. BioNLP, 2016.
- Empirical studies on word representations. [bibtex]
Simon Šuster. PhD thesis, 2016.
- Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders. [bibtex · slides · code · video]
Simon Šuster, Ivan Titov and Gertjan van Noord. NAACL (long paper), 2016.
- 2015
- Word Representations, Tree Models and Syntactic Functions. [bibtex · code ]
Simon Šuster, Gertjan van Noord and Ivan Titov. arXiv preprint arXiv:1508.07709, 2015.
- GLAD: Groningen Lightweight Authorship Detection. [bibtex · code]
Manuela Hürlimann, Benno Weck, Esther van den Berg, Simon Šuster and Malvina Nissim. Uncovering Plagiarism, Autorship and Social Software Misuse, CLEF, Author Identification challenge, 2015.
- An investigation into language complexity of World-of-Warcraft game-external texts. [bibtex]
Simon Šuster. arXiv preprint arXiv:1502.02655, 2015.
- 2014
- From neighborhood to parenthood: the advantages of dependency representation over bigrams in Brown clustering. [bibtex · slides · code · data]
Simon Šuster and Gertjan van Noord. COLING, 2014.
- 2013
- Semantic Mapping for Lexical Sparseness Reduction in Parsing. [bibtex]
Simon Šuster and Gertjan van Noord. ESSLLI Extrinsic Parse Improvement Workshop, 2013.
- <2013
- Resolving PP-attachment ambiguity in French with distributional methods. [bibtex]
Simon Šuster. Master thesis, 2012.
- →Publications in Slovene
Talks
- 2024
- Automating risk-of-bias assessment with generative AI Long oral talk, Global Evidence Summit, Prague, 2024.
- 2023
- Maske ne obvarujejo pred virusi. Ali pač? Avtomatsko ugotavljanje kakovosti izsledkov v medicini [povzetek · video]Vabljeno predavanje, seminar JOTA, Fakulteta za računalništvo in informatiko Univerze v Ljubljani, 2023.
- Automated quality assessment of medical evidence to support systematic reviewing. NLP reading group talk, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, 2023.
- Using Machine Learning and Natural Language Processing to Structure Medical Evidence and Grade its Quality. ]Mobilizing Computable Biomedical Knowledge (MCBK) meeting, 2023.
- 2022
- Automated quality assessment of medical evidence to support systematic reviewing. Invited talk, CLiPS Colloqium, University of Antwerp, 2022.
- When to trust a classifier for quality assessment of medical evidence? International Collaboration for the Automation of Systematic Reviews
(ICASR), Cologne, Germany, 2022.
- Automated quality assessment of medical evidence to support systematic reviewing. Invited talk, UMass BioNLP, 2022.
- Robustness and practical applicability of EvidenceGRADEr. Seminar of ARC Training Centre in Cognitive Computing for Medical Technologies, short talk, 2022.
- 2021
- Automated quality assessment of medical evidence. Seminar of ARC Training Centre in Cognitive Computing for Medical Technologies, short talk, 2021.
- Automated quality assessment of medical evidence. Seminar of ARC Training Centre in Cognitive Computing for Medical Technologies, long talk, 2021.
- Automated quality assessment of medical evidence. Seminar of ARC Training Centre in Cognitive Computing for Medical Technologies, short talk, 2021.
- 2020
- Stream 4: Synthesising, appraising and exploring medical evidence. ITTC Healthcare Symposium, 2020.
- Creating a dataset for automated quality assessment of medical evidence (part 2). Seminar of ARC Training Centre in Cognitive Computing for Medical Technologies, short talk, 2020.
- I want to know what attention is (I want you to show me). ARC Training Centre in Cognitive Computing for Medical Technologies, machine learning reading group., 2020.
- Creating a dataset for automated quality assessment of medical evidence (part 1). Seminar of ARC Training Centre in Cognitive Computing for Medical Technologies, short talk, 2020.
- COVID-SEE: Scientific Evidence Explorer for COVID-19. Seminar of ARC Training Centre in Cognitive Computing for Medical Technologies, long talk, 2020.
- Real-time clinical decision support. Seminar of ARC Training Centre in Cognitive Computing for Medical Technologies, short talk, 2020.
- 2019
- Machine reading for medicine. Closing workshop of the Accumulate project, KULeuven, 2019.
- Why memory networks can’t read effectively? CLiPS, University of Antwerp, 2019.
- Sequence models for probability word problems. CLiPS, University of Antwerp, 2019.
- Fill the gap: Machine reading comprehension for medicine. Invited talk at NLP Meetup Belgium, May 2019.
- Machine reading comprehension and word problem solving (with external knowledge). Annual meeting of CLiPS, 2019.
- Memories are made of this: A primer on memory networks for QA. CLiPS, University of Antwerp, 2019.
- Revisiting neural relation classification in clinical notes with external information. CLIN, 2019.
- 2018
- Technology developed at CLiPS. Accumulate industrial meeting, September 2018.
- Spelling correction with word and character n-gram embeddings. Invited talk at Vectors and Linguistics: a workshop on word embeddings, University of Leiden, March 2018.
- Technology developed at CLiPS. Accumulate industrial meeting, March 2018.
- 2017
- Clinical Machine Comprehension with Case Reports. ATILA, 2017.
- Clinical Machine Comprehension Using Case Reports. (poster) American Medical Informatics Association (AMIA) Annual Symposium, 2017.
- What is attention in NNs? (with two examples) A 20-min tutorial, CLiPS, University of Antwerp, 2017.
- Representation learning for words. Guest lecture at the Current trends in AI master course, Free University of Brussels, 2017.
- Clinical Case Reports Dataset for Machine Reading. (poster) 27th meeting of Computational Linguistics in the Netherlands (CLIN), 2017.
- 2016
- The challenges in concept detection for clinical texts. Accumulate industrial meeting, 2016.
- Towards clinical language understanding. ATILA, 2016.
- Clinical language processing: the first steps Annual meeting of CLiPS, 2016.
- 2015
- Inducing multi-sense word representations multilingually. ATILA and CLIN, 2015.
- Who's the bad guy? OlympIKade: Informatiekunde Matchingsdag, 2015.
- Presentation of E. Bender's (2011) On Achieving and Evaluating Language-Independence in NLP. RUG Computational Linguistics reading group, 2015.
- Overview of Learning From Data’s Final Project: Author Verification. With Malvina Nissim. RUG Computational Linguistics reading group, 2015.
- Tree models, syntactic functions and word representations. The 25th meeting of Computational Linguistics in the Netherlands (CLIN), 2015.
- 2014
- From perceptrons to word embeddings (a high-level introduction). RUG Computational Linguistics reading group, 2014.
- Extending Hidden Markov (tree) models for word representations. (poster) 23rd annual Belgian-Dutch Conference on Machine Learning (BENELEARN), 2014. [abstract]
- How to write a master's thesis: a computational linguist's view. RUG Research Master's in Linguistics meeting, April 2014, March 2015.
- Dependency-tuned word clusters for Dutch. The 24th meeting of Computational Linguistics in the Netherlands (CLIN), 2014.
- 2013
- Reading group presentation on Reddy et al. 2011 paper on Dynamic and Static Prototypes For Semantic Composition. RUG Computational Linguistics reading group, 2013.
- Semantic Mapping for Lexical Sparseness Reduction in Parsing. ESSLLI Extrinsic Parse Improvement Workshop, 2013.
- The Brown et al. 1992 Clustering. RUG Computational Linguistics reading group, 2013.
- Semantic Mapping for Lexical Sparseness Reduction in Parsing. New Frontiers in Parsing and Generation Workshop, 2013.
- Lexical Association Analysis For Semantic-Class Feature Enhancement In Parsing. (poster) 23rd meeting of Computational Linguistics in the Netherlands (CLIN), 2013.
- <2013
- Resolving PP-attachment ambiguity by distributional semantic modeling in the context of parsing of French. RUG Computational Linguistics reading group, 2012.
- The SSJ corpus in the context of Slovene reference corpora. With Olga Yeroshina Pobirk. 7th International Conference Practical Applications in Language and Computers, 2009.
- →Talks in Slovene
Teaching
- 2017/2018, 2018/2019, 2019/2020
- Computational Linguistics, UA_2010FLWTAA. With prof. dr. Walter Daelemans
- POS-tagging & Minimum Edit Distance
- Syntactic Analysis & Parsing
- Semantic Role Labeling & Frame Semantics
- 2014/2015
- Learning from data. With dr. Malvina Nissim
- final project: Authorship verification
- topics from 2013/2014
- 2013/2014
- Learning from data. With prof. dr. Gertjan van Noord
- Introduction to Weka, The Perceptron
- K-means
- Brown Clustering
- Linear Regression
- final project: Movie revenue prediction from reviews
- <2013/2014
- Corpustaalkunde (Corpus linguistics), 2013. Assisting dr. Gosse Bouma
- Corpustaalkunde (Corpus linguistics), 2011. Assisting dr. Gosse Bouma