Q-DB Machine Learning

The Quantemol team has developed a machine learning regression model to rapidly estimate reaction rate coefficients for heavy particle collisions. The model was trained on approximately 10,000 instances of kinetic data obtained from four popular plasma process databases: QDB [1], NFRI [2], KIDA [3], and UDfA [4]. The model features were engineered from commonly available data that describe individual chemical species involved in the reactions, such as molar masses, charges, enthalpies of formation, dipole moments, polarizabilities, and elemental composition data.

The final prediction algorithm is a voting regressor that combines distinct, optimised machine learning regressors, including support vector regressor [5], random forest regressor [6], gradient-boosted trees regressor [6, 7], and k-nearest neighbours regressor [8].

An evaluation of the prediction algorithm was performed on a dataset containing over 1000 test reactions. It was discovered that, for more than 87% of these reactions, the predicted rate coefficient values had an error of less than one order of magnitude. Please note, the prediction error for certain reactions was significantly higher.

We invite all users to benefit from this model and populate your chemistry sets with missing data for heavy particle reactions.

However, it is important to keep in mind that while machine learning models can offer fast predictions, they may not be as accurate as more complex calculations like quantum chemistry methods. Therefore, the results obtained with machine learning models should be treated with a certain degree of scepticism.

Please use this reference to cite the data obtained with the model: Lemishko K M, Hanicinec M, Mohr S, Dzarasova A, Tennyson J. 2023. Machine learning-based approach for fast kinetic data estimation. [Unpublished].

If you have any questions about how to use the model or suggestions how to improve it please email us on qdbsupport@quantemol.com.

 

References:

    1. Tennyson J, Mohr S, Hanicinec M, Brown D, Dzarasova A, Alves L L, Bartschat K, Bogaerts A, Booth J P, Braams B J, Bruggemann P J, Engelmann S U, Gans T, Goeckner M J, Hamaguchi S, Hamilton K R, Hassall G, Hill C, Hassouni K, Krishnakumar E, Kushner M J, Laricchiuta, A, Mason N J, Pandey S, Petrovic Z L, Pu Y K, Rachimova T, Ranjan A, Rauf, S, Schulze J, Yoon J S, Veer K, Zatsarinny O. 2022. Plasma Sources Sci. Technol. 31 095020. The 2021 release of the Quantemol database (QDB) of plasma chemistries and reactions.
    2. Park J H, Choi H, Chang W S, Chung S Y, Kwon D C, Song M Y and Yoon J S . 2020. Applied Science and Convergence Technology. A new version of the plasma database for plasma physics in the data center for plasma properties.
    3. Wakelam V, Herbst E, Loison J C, Smith I W M, Chandrasekaran V, Pavone B, Adams N G, Bacchus-Montabonel M C, Bergeat A, Béroff K, Bierbaum V M, Chabot M, Dalgarno A, van Dishoeck E F, Faure A, Geppert W D, Gerlich D, Galli D, Hébrard E, Hersant F, Hickson K M, Honvault P, Klippenstein S J, Le Picard S, Nyman G, Pernot P, Schlemmer S, Selsis F, Sims I R, Talbi D, Tennyson J, Troe J, Wester R and Wiesenfeld L. 2012. The Astrophysical Journal Supplement Series. A kinetic database for Astrochemistry (KIDA).
    4. McElroy D, Walsh C, Markwick A J, Cordiner M A, Smith K and Millar T J. 2013. Astronomy & Astrophysics. The UMIST database for astrochemistry 2012.
    5. Boser B E, Guyon I M and Vapnik V N. 1992. Proceedings of the fifth annual workshop on Computational learning theory – COLT ’92 (Pittsburgh, Pennsylvania, United States: ACM Press) pp 144–152. A training algorithm for optimal margin classifiers.
    6. Breiman L (ed). 1998. Classification and regression trees 1st ed (Boca Raton, Fla.: Chapman & Hall/CRC).
    7. Breiman L. 1997. Arcing the edge.
    8. Biau G , Devroye. 2015. Springer Series in the Data Sciences. Springer New York. Lectures on the Nearest Neighbor Method.

     

    Please enter the chemical reaction that you wish to investigate. Please note:

        • Elements are case-sensitive, e.g. Ar,O,CH4,HCl
        • Ions are specified by + or -, eventually followed by the charge number, e.g. H+, Cl-, Ar+2
        • Species are separeted by ‘ + ‘ and reactants and products are separated by ‘ -> ‘. Please mind the spaces between the species and these symbols.
        • Reaction rate coefficient only works for two reactant-product pairs e.g ‘Ar+ + H2O -> Ar + H2O+’ or for reactions with three products e.g ‘O3 + HO2 -> O2 + O2 + OH’