An outstanding achievement has been accomplished by Masaryk University student Filip Jozefov. With his master’s thesis, prepared under the supervision of Aleš Křenek from CERIT-SC and in collaboration with IOCB of the Czech Academy of Sciences, he reached the TOP 10 in the Best Master’s Thesis category of this year’s prestigious Werner von Siemens Award. In strong nationwide competition, he demonstrated not only exceptional quality but also a significant innovative contribution of his research.
Artificial Intelligence Reveals Unknown Molecules
The thesis, titled Predicting molecular structures from multi-stage MSn fragmentation trees using graph neural networks and DreaMS foundation model, focuses on processing mass spectrometry data, primarily with applications in metabolomics. It addresses one of the biggest challenges in contemporary analytical chemistry and biology—the inability to identify most small molecules present in living organisms, which are critical for understanding the functioning of biological systems (the so-called “dark metabolome”).
The primary technology for large-scale identification of metabolite structures is tandem mass spectrometry (MS/MS). However, interpreting the resulting spectra is often ambiguous and incomplete. In his work, Filip Jozefov introduces a breakthrough solution: unlike current machine learning models, which typically rely on a single stage of fragmentation (MS2), he incorporates multi-stage data (MSn). In this approach, molecules are fragmented in successive rounds, revealing much deeper structural layers. However, processing such data is significantly more complex.
Tenfold Improvement in Accuracy Thanks to Advanced Models
To analyze these complex data, Filip developed the very first neural network models trained on MSn spectra. In his work, he successfully combined graph neural networks with the DreaMS model (a foundation model for mass spectra embedding). The models were tested on two key tasks: searching for molecular structures from candidate sets and de novo structure generation.
Key contributions of the thesis:
- Significant increase in accuracy: Incorporating multi-stage MSn fragmentation improved structure search accuracy by up to ten times compared to MS2 alone.
- Richer data representation: Deeper MSn levels produce far more informative spectral representations.
- Open science and research infrastructure: As part of the work, Filip created MassSpecGymMSn—the first open benchmark for molecular annotation based on MSn. It contains 16,476 fragmentation trees (up to MS5 level), including preprocessing tools. This valuable dataset is publicly available in a repository to support further research in the field.
The results of this work have enormous application potential across many fields—from drug discovery and personalized medicine to more accurate disease diagnostics, as well as agriculture and environmental protection. One of the greatest challenges addressed by this research is de novo molecular structure generation. In practice, this means that artificial intelligence can identify molecules directly from measured spectral data—even those not present in existing databases. The ambition and complexity of this task are comparable to the revolution initiated by the AlphaFold model in protein structure prediction. If these algorithms continue to improve, they have the potential to fundamentally transform how humanity discovers and maps new chemical structures.
About the Werner von Siemens Award 2025
The Werner von Siemens Award is presented by Siemens to students and young scientists in technical and natural sciences fields. In terms of scope and history, it ranks among the most significant independent initiatives of its kind in the Czech Republic. In this 28th edition, 56 independent experts evaluated a total of 662 submissions, nearly one-fifth (19%) of which came from the field of chemistry. Filip’s placement in the TOP 10 is an outstanding achievement. Congratulations!
