2014
Gütlein, Martin; Karwath, Andreas; Kramer, Stefan
CheS-Mapper 2.0 for visual validation of (Q)SAR models Journal Article
In: J. Cheminformatics, vol. 6, no. 1, pp. 41, 2014.
Abstract | Links | BibTeX | Tags: cheminformatics, data mining, graph mining, validation, visualization
@article{gutlein2014,
title = {CheS-Mapper 2.0 for visual validation of (Q)SAR models},
author = {Martin Gütlein and Andreas Karwath and Stefan Kramer},
url = {http://dx.doi.org/10.1186/s13321-014-0041-7},
doi = {10.1186/s13321-014-0041-7},
year = {2014},
date = {2014-09-23},
journal = {J. Cheminformatics},
volume = {6},
number = {1},
pages = {41},
abstract = {Background
Sound statistical validation is important to evaluate and compare the overall performance of (Q)SAR models. However, classical validation does not support the user in better understanding the properties of the model or the underlying data. Even though, a number of visualization tools for analyzing (Q)SAR information in small molecule datasets exist, integrated visualization methods that allow the investigation of model validation results are still lacking.
Results
We propose visual validation, as an approach for the graphical inspection of (Q)SAR model validation results. The approach applies the 3D viewer CheS-Mapper, an open-source application for the exploration of small molecules in virtual 3D space. The present work describes the new functionalities in CheS-Mapper 2.0, that facilitate the analysis of (Q)SAR information and allows the visual validation of (Q)SAR models. The tool enables the comparison of model predictions to the actual activity in feature space. The approach is generic: It is model-independent and can handle physico-chemical and structural input features as well as quantitative and qualitative endpoints.
Conclusions
Visual validation with CheS-Mapper enables analyzing (Q)SAR information in the data and indicates how this information is employed by the (Q)SAR model. It reveals, if the endpoint is modeled too specific or too generic and highlights common properties of misclassified compounds. Moreover, the researcher can use CheS-Mapper to inspect how the (Q)SAR model predicts activity cliffs. The CheS-Mapper software is freely available at http://ches-mapper.org.
Graphical abstract
Comparing actual and predicted activity values with CheS-Mapper.},
keywords = {cheminformatics, data mining, graph mining, validation, visualization},
pubstate = {published},
tppubtype = {article}
}
Sound statistical validation is important to evaluate and compare the overall performance of (Q)SAR models. However, classical validation does not support the user in better understanding the properties of the model or the underlying data. Even though, a number of visualization tools for analyzing (Q)SAR information in small molecule datasets exist, integrated visualization methods that allow the investigation of model validation results are still lacking.
Results
We propose visual validation, as an approach for the graphical inspection of (Q)SAR model validation results. The approach applies the 3D viewer CheS-Mapper, an open-source application for the exploration of small molecules in virtual 3D space. The present work describes the new functionalities in CheS-Mapper 2.0, that facilitate the analysis of (Q)SAR information and allows the visual validation of (Q)SAR models. The tool enables the comparison of model predictions to the actual activity in feature space. The approach is generic: It is model-independent and can handle physico-chemical and structural input features as well as quantitative and qualitative endpoints.
Conclusions
Visual validation with CheS-Mapper enables analyzing (Q)SAR information in the data and indicates how this information is employed by the (Q)SAR model. It reveals, if the endpoint is modeled too specific or too generic and highlights common properties of misclassified compounds. Moreover, the researcher can use CheS-Mapper to inspect how the (Q)SAR model predicts activity cliffs. The CheS-Mapper software is freely available at http://ches-mapper.org.
Graphical abstract
Comparing actual and predicted activity values with CheS-Mapper.
Seeland, Madeleine; Karwath, Andreas; Kramer, Stefan
Structural clustering of millions of molecular graphs Conference
Symposium on Applied Computing, SAC 2014, ACM ACM, New York, NY, USA, 2014.
Abstract | Links | BibTeX | Tags: cluster explanation, clustering, graph mining
@conference{seeland2014a,
title = {Structural clustering of millions of molecular graphs},
author = {Madeleine Seeland and Andreas Karwath and Stefan Kramer},
url = {http://doi.acm.org/10.1145/2554850.2555063},
doi = {10.1145/2554850.2555063},
year = {2014},
date = {2014-03-24},
urldate = {2014-03-24},
booktitle = {Symposium on Applied Computing, SAC 2014},
pages = {121-128},
publisher = {ACM},
address = {New York, NY, USA},
organization = {ACM},
abstract = {Statistical machine learning algorithms building on patterns found by pattern mining algorithms have to cope with large solution sets and thus the high dimensionality of the feature space. Vice versa, pattern mining algorithms are frequently applied to irrelevant instances, thus causing noise in the output. Solution sets of pattern mining algorithms also typically grow with increasing input datasets. The paper proposes an approach to overcome these limitations. The approach extracts information from trained support vector machines, in particular their support vectors and their relevance according to their coefficients. It uses the support vectors along with their coefficients as input to pattern mining algorithms able to handle weighted instances. Our experiments in the domain of graph mining and molecular graphs show that the resulting models are not significantly less accurate than models trained on the full datasets, yet require only a fraction of the time using much smaller sets of patterns.},
keywords = {cluster explanation, clustering, graph mining},
pubstate = {published},
tppubtype = {conference}
}
2006
Karwath, Andreas; De Raedt, Luc
SMIREP: Predicting Chemical Activity from SMILES Journal Article
In: Journal of Chemical Information and Modeling, vol. 46, no. 6, pp. 2432 - 2444, 2006.
Abstract | Links | BibTeX | Tags: cheminformatics, graph mining, machine learning, QSAR, relational learning, scientific knowledge
@article{karwath06c,
title = {SMIREP: Predicting Chemical Activity from SMILES},
author = {Andreas Karwath and De Raedt, Luc},
url = {http://pubs.acs.org/doi/abs/10.1021/ci060159g},
doi = {10.1021/ci060159g},
year = {2006},
date = {2006-10-12},
journal = {Journal of Chemical Information and Modeling},
volume = {46},
number = {6},
pages = {2432 - 2444},
abstract = {Most approaches to structure-activity-relationship (SAR) prediction proceed in two steps. In the first step, a typically large set of fingerprints, or fragments of interest, is constructed (either by hand or by some recent data mining techniques). In the second step, machine learning techniques are applied to obtain a predictive model. The result is often not only a highly accurate but also hard to interpret model. In this paper, we demonstrate the capabilities of a novel SAR algorithm, SMIREP, which tightly integrates the fragment and model generation steps and which yields simple models in the form of a small set of IF-THEN rules. These rules contain SMILES fragments, which are easy to understand to the computational chemist. SMIREP combines ideas from the well-known IREP rule learner with a novel fragmentation algorithm for SMILES strings. SMIREP has been evaluated on three problems: the prediction of binding activities for the estrogen receptor (Environmental Protection Agency's (EPA's) Distributed Structure-Searchable Toxicity (DSSTox) National Center for Toxicological Research estrogen receptor (NCTRER) Database), the prediction of mutagenicity using the carcinogenic potency database (CPDB), and the prediction of biodegradability on a subset of the Environmental Fate Database (EFDB). In these applications, SMIREP has the advantage of producing easily interpretable rules while having predictive accuracies that are comparable to those of alternative state-of-the-art techniques.},
keywords = {cheminformatics, graph mining, machine learning, QSAR, relational learning, scientific knowledge},
pubstate = {published},
tppubtype = {article}
}
2004
Bringmann, Björn; Karwath, Andreas
Frequent SMILES Miscellaneous
Lernen, Wissensentdeckung und Adaptivität, Workshop GI Fachgruppe Maschinelles Lernen, part of LWA, 2004, (Berlin, Germany).
Abstract | BibTeX | Tags: cheminformatics, graph mining, machine learning
@misc{wshp-fgml-BringmannK04,
title = {Frequent SMILES},
author = {Björn Bringmann and Andreas Karwath},
year = {2004},
date = {2004-10-01},
abstract = {Predictive graph mining approaches in chemical databases are extremely popular and effective. Most of these approaches first extract frequent sub-graphs and then use them as features to build predictive models. In the work presented here, the approach taken is similar. However, instead of frequent sub-graphs, frequent trees, based on SMILES strings are derived. For this, the SMILES strings of chemical compounds are decomposed into fragment trees, which in turn are mined for interesting sub-trees. These tree based patterns are then used as features by a classifier to build predictive models. The approach is experimentally evaluated on a real world chemical data set.},
howpublished = {Lernen, Wissensentdeckung und Adaptivität, Workshop GI Fachgruppe Maschinelles Lernen, part of LWA},
note = {Berlin, Germany},
keywords = {cheminformatics, graph mining, machine learning},
pubstate = {published},
tppubtype = {misc}
}
Karwath, Andreas; De Raedt, Luc
Predictive Graph Mining Conference
The International Workshop on Mining Graphs, Trees and Sequences, MGTS 2004, 2004, (workshop).
BibTeX | Tags: cheminformatics, graph mining, machine learning, QSAR
@conference{karwath04b,
title = {Predictive Graph Mining},
author = {Andreas Karwath and De Raedt, Luc},
year = {2004},
date = {2004-09-01},
booktitle = {The International Workshop on Mining Graphs, Trees and Sequences, MGTS 2004},
pages = {25-36},
note = {workshop},
keywords = {cheminformatics, graph mining, machine learning, QSAR},
pubstate = {published},
tppubtype = {conference}
}
Karwath, Andreas; De Raedt, Luc
Predictive Graph Mining Conference
The 7th International Conference of Discovery Science, DS 2004, vol. 3245, Lecture Notes in Artificial Intelligence Springer-Verlag Berlin Heidelberg Springer Verlag, Berlin Heidelberg, Germany, 2004, ISBN: 978-3-540-23357-2.
Abstract | Links | BibTeX | Tags: cheminformatics, graph mining, machine learning, QSAR
@conference{karwath04a,
title = {Predictive Graph Mining},
author = {Andreas Karwath and De Raedt, Luc},
url = {http://link.springer.com/chapter/10.1007%2F978-3-540-30214-8_1},
doi = {10.1007/978-3-540-30214-8_1},
isbn = {978-3-540-23357-2},
year = {2004},
date = {2004-01-01},
booktitle = {The 7th International Conference of Discovery Science, DS 2004},
volume = {3245},
pages = {1-15},
publisher = {Springer Verlag},
address = {Berlin Heidelberg, Germany},
organization = {Springer-Verlag Berlin Heidelberg},
series = {Lecture Notes in Artificial Intelligence},
abstract = {Graph mining approaches are extremely popular and effective in molecular databases. The vast majority of these approaches first derive interesting, i.e. frequent, patterns and then use these as features to build predictive models. Rather than building these models in a two step indirect way, the SMIREP system introduced in this paper, derives predictive rule models from molecular data directly. SMIREP combines the SMILES and SMARTS representation languages that are popular in computational chemistry with the IREP rule-learning algorithm by Fürnkranz. Even though SMIREP is focused on SMILES, its principles are also applicable to graph mining problems in other domains. SMIREP is experimentally evaluated on two benchmark databases.},
keywords = {cheminformatics, graph mining, machine learning, QSAR},
pubstate = {published},
tppubtype = {conference}
}