Professor Julien Wist

Professor Julien Wist

Deputy Director


Computers will reveal the ‘next periodic table’ – share data with machines first!

A liquid sample can be characterized by NMR within minutes, but it may take hours of interpretation to elucidate a structure or annotate its components, i.e., to convert data into knowledge. This interpretation is mainly achieved by trained spectroscopists and published in scientific journals. However, only a fraction of the data can be published in this way, thus the raw (original) data is often lost and the conclusions are buried into pdf files without proper format, making it very difficult to re-use. While humans can, and prefer to, learn from this summarized information, machine learning or artificial intelligence requires access to the original data. In tandem with Luc Patiny from the Ecole Polytechnique Fédérale de Lausanne (Switzerland) we showcased a database for spectroscopic data https://www.mylims.org in 2008(now offline) that allows users to request services, predict, simulate and view NMR spectra, and manipulate them online so that the data will never leave the lab before being stored into a database. Over the years features were added to promote its use, enabling the creation of interactive teaching material for online courses. The rationale was that students could learn with the same tools that they would later use in the lab. Although valid, this argument was far too simplistic and integrating everything into a single platform was too difficult to maintain. The java applet technology used by then got unsafe and deprecated, and we ported the code to JavaScript. This is a rather unusual choice for building scientific tools and was dictated by the fact that we believed browsers were to become the fastest graphical interfaces, receiving a lot of effort from many giant companies. Moreover, inspired by software practices, the next iteration was built in three pillars that work independently, storage, processing engine and visualization. A working version of it is available at https://www.c6h6.org, and all the necessary code is shared on https://www.github.com/cheminfo under the open source MIT license. The first mentioned database has been up and running at EPFL, at Universidad del Valle and in several industries (thanks to Luc) for 12 years. The smallest one, in Colombia, has > 30,000 entries and 100 active users.

L. Patiny, M. Zasso, D. Kostro, A. Bernal, A.M. Castillo, A. Bolaños, M.A. Asencio, N. Pellet, M. Todd, N. Schloerer, S. Kuhn, E. Holmes, S. Javor, J. Wist, The C6H6 NMR repository: an integral solution to control the flow of your data from the magnet to the public, Magn. Reson. Chem. (2017). https://doi.org/10.1002/mrc.4669.

Automatic assignment of NMR spectra. Working on an automatic assignment pipeline showed us the limitation of not having access to large collections of curated data. It is, in my opinion, a pertinent example of how the access to high-quality data can improve the development of future algorithms that in turn can improve the quality of the same data, by simultaneously improving the quality of spectra assignment and prediction of NMR parameters. Using Bayesian statistics, we were able to automatically assign NMR signals without relying on the prediction of NMR parameters, only on symmetry properties of spectra and molecules. This breakthrough allowed us to construct, train and test a self-learning algorithm that can be used for assisting curation of NMR repositories for instance. This is the more recent of a series of tools built to assist the NMR community that is visited by several thousand visitors each month. Assignment and structure elucidation is an inverse problem. The forward problem is spectra simulation. We established a very efficient approach for spectra simulation that is extremely fast and accurate, enough for most applications in a web browser or for an algorithm that needs the synthesis of a large number of spectra. Finally, we contributed to a methodology to quickly compare spectra without the need for peak-picking (that may fail in automatic mode) that was shown accurate enough to rank predictors, i.e., rank the algorithms that produce the more similar spectra compared to an experimental one. These contributions set bases for artificial intelligence for structure elucidation.

A.M. Castillo, L. Patiny, J. Wist, Fast and accurate algorithm for the simulation of NMR spectra of large spin systems, J. Magn. Reson. 209 (2011) 123–130.

A.M. Castillo, A. Bernal, L. Patiny, J. Wist, A new method for the comparison of 1H NMR predictors based on tree-similarity of spectra, J. Cheminform. 6 (2014) 9.

A.M. Castillo, A. Bernal, R. Dieden, L. Patiny, J. Wist, “Ask Ernö”: a self-learning tool for assignment and prediction of nuclear magnetic resonance spectra, J. Cheminform. 8 (2016) 26.

A picture is worth a thousand words. Although the core of our research aims at teaching computers how to perform our job, completely automated analytical pipelines are difficult to achieve and often result in so-called ‘black boxes’, while data analysis starts (or should start) and ends with data visualization. Indeed visual inspection of the data and result is a key aspect to ensure the quality and integrity of the analysis. Therefore, we have been working on improving the visualization capability of chemical data building and integrating JavaScript libraries that allow us to turn web pages into powerful interactive visualization tools. We believe that web pages are the best way to publish and share information, since a single URL, as in Uniform Resource Locator, can provide the data, the code for its analysis and the layout for its visualization. Web pages can be readily integrated with processing or statistical packages such as R or python. Here, as an example, is a view (a web page) that permits to quickly explore metabonomics data, relate scores to the original data and retrieve reference spectra for comparison. All the information is computed using R, made available to the browser using the hastaLaVista R package and the visualizer JavaScript platform. It is worth noting that although we promote the use of web pages as interactive graphical interfaces, the processes occur locally on the user’s computer without any data leaving the client computer.

In another example we were able to quickly build visualization tools that support the COMPASS methodology for semi automatic feature extraction in population studies.

Overview of the COMPASS analysis

J. Wist, HastaLaVista, a web-based user interface for NMR-based untargeted metabolic profiling analysis in biomedical sciences: towards a new publication standard, J. Cheminform. 11 (2019) 75.

J. Wist, Complex Mixtures by NMR and Complex NMR for Mixtures: experimental and publication challenges, Magn. Reson. Chem. (2016). https://doi.org/10.1002/mrc.4533.

R.L. Loo, Q. Chan, H. Antti, J.V. Li, H. Ashrafian, P. Elliott, J. Stamler, J.K. Nicholson, E. Holmes, J. Wist, Strategy for improved characterisation of human metabolic phenotypes using a COmbined Multiblock Principal components Analysis with Statistical Spectroscopy (COMPASS), Bioinformatics. (2020). https://doi.org/10.1093/bioinformatics/btaa649.

Mutual-diffusion driven NMR spectroscopy. In a different direction, we have shown that it is possible to create a concentration gradient inside an NMR tube in a controlled and reproducible manner. This places NMR as the most comprehensive method to study molecular diffusion phenomena. NMR has been extensively used for the determination of self-diffusion coefficients, in multicomponent systems. In turn, very few data are available for mutual diffusion, with the most recent progress reported using RAMAN spectroscopy. RAMAN and now NMR spectroscopy enable measurements of truly multicomponent systems outside equilibrium due to their high resolution. This is the first reported measurement of concentration gradient dynamics by NMR and opens up opportunities to an area where experimental data is lacking, in order to validate theoretical models. It also represents an interesting avenue for in-situ separation of mixtures.

Spatial mapping of a biphasic system TEA/D2O with four solutes at 274 K using the Double Pulse Field Gradient Selective Echo sequence, acquired after 16h. Blue and yellow lines represent the aqueous and organic phases. On the upper left we display the evolution of the acetonitrile signal for the z=0 (interface) slice at different times.

C.F. Pantoja, Y.M. Muñoz-Muñoz, L. Guastar, J. Vrabec, J. Wist, Composition Dependent Transport Diffusion in Non-Ideal Mixtures from Spatially Resolved Nuclear Magnetic Resonance Spectroscopy, Phys. Chem. Chem. Phys. (2018). https://doi.org/10.1039/C8CP05539D.

C.F. Pantoja, J.A. Bolaños, A. Bernal, J. Wist, Mutual Diffusion Driven NMR: a new approach for the analysis of mixtures by spatially resolved NMR spectroscopy, Magn. Reson. Chem. (2017). https://doi.org/10.1002/mrc.4561.

Further information about my research: researchgate, ORCID

After a chemistry degree at Université de Lausanne (Switzerland), which involved acquiring international experience at Ulm Universität (Germany) and a year of internship at Universidad del Valle (Colombia), I obtained my PhD as NMR spectroscopist at Ecole Polytéchnique Fédérale de Lausanne (Switzerland). In 2006, I started a research group at Universidad Nacional de Colombia in Bogotá and in 2010 I obtained a professorship at Universidad del Valle (Colombia), where I have served as Director of International Relations (2015), Director of the Technology Transfer Office (2016-2020) and Coordinator of the Institutional Laboratory Program (2016-2020). During this period, I led the creation of a tandem group for plant metabolomics between Unversidad del Valle and the Max Planck Society to strengthen high-profile international relations, coordinated the creation of an Institutional Laboratory System and Central Direction and managed the funding and creation of the first shared core facility at the University, Bioanalitics. I have coordinated and participated in several local and international research projects and my group trained 11 undergraduate and 10 postgraduate students. Currently, I’m joining efforts to consolidate the cheminformatics and computational capacity of the Australian National Phenome Center (ANPC) and Center for Computational Systems Medicine (CCSM) and strengthen the relationship between Universidad del Valle and Murdoch University.

The LAMPS network.

In-house Internationalization as an opportunity for our students. Because of the lack of resources for our Colombian students to achieve international experience abroad, I dedicated efforts in bringing high profile international speakers to consolidate the NMR community in Colombia. Starting in 2008 I have chaired 6 international NMR schools in Colombia. In 2012, with the collaboration of Elaine Holmes, we fostered metabolic profiling events in Latin America, in Lima (Peru), Rosario (Argentina) and Rio de Janeiro (Brasil), and the creation of the Latin American Metabolic Profiling Society (LAMPS) In addition, I coordinated for 6 years a joint effort between Universidad del Valle (Colombia) and EPFL (Switzerland) and hosted in our lab 32 in-bound students from 6 nationalities. Bringing 42 top researchers and international students to Latin America and Colombia played an important role in the formation of our students and helped establish a network of laboratories to promote our projects in Latin America. In 2015, our research group was involved in the organization of the Solar Decathlon Latin America and the Caribbean by designing, building and coordinating the real-time monitoring and evaluation of parameters of comfort in the 17 participating houses. This event brought over 400 international fellows and students to our campus, united with a common goal of using our knowledge and ideas to the benefit of the environment. This reflects a genuine concern to build and participate in the construction of durable international networks and to consolidated institutions in Latin America, beyond the short-term agenda of research.

The DARMN research group and international interns (6 nationalities)

View my ORCID