| 1 |
PySpark and RDKit: moving towards big data in cheminformatics |
139 |
2019 |
| 2 |
How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing |
127 |
2023 |
| 3 |
Machine learning in continuous casting of steel: A state-of-the-art survey |
126 |
2022 |
| 4 |
External and intrinsic plagiarism detection using vector space models |
118 |
2009 |
| 5 |
Understanding the true effects of the COVID-19 lockdown on air pollution by means of machine learning |
116 |
2021 |
| 6 |
Why do users tag? Detecting usersâ motivation for tagging in social tagging systems |
96 |
2010 |
| 7 |
Machine learning in prediction of intrinsic aqueous solubility of drugâlike compounds: Generalization, complexity, or predictive ability? |
90 |
2021 |
| 8 |
Of categorizers and describers: An evaluation of quantitative measures for tagging motivation |
88 |
2010 |
| 9 |
Recent advances of differential privacy in centralized deep learning: A systematic survey |
84 |
2025 |
| 10 |
Understanding why users tag: A survey of tagging motivation literature and results from an empirical study |
71 |
2012 |
| 11 |
Authorship identification of documents with high content similarity |
63 |
2018 |
| 12 |
External and intrinsic plagiarism detection using a cross-lingual retrieval and segmentation system |
62 |
2010 |
| 13 |
Evaluation of folksonomy induction algorithms |
55 |
2012 |
| 14 |
Mesh-free surrogate models for structural mechanic FEM simulation: A comparative study of approaches |
55 |
2021 |
| 15 |
Aspects of broad folksonomies |
54 |
2007 |
| 16 |
Deep learningâa first meta-survey of selected reviews across scientific disciplines, their commonalities, challenges and research impact |
53 |
2021 |
| 17 |
Establishing and evaluating trustworthy AI: overview and research challenges |
52 |
2024 |
| 18 |
Big data as a promoter of industry 4.0: Lessons of the semiconductor industry |
49 |
2017 |
| 19 |
Constructing robust health indicators from complex engineered systems via anticausal learning |
49 |
2022 |
| 20 |
Assessing trustworthy AI: Technical and legal perspectives of fairness in AI |
48 |
2024 |
| 21 |
Teambeam-meta-data extraction from scientific literature |
48 |
2012 |
| 22 |
Unsupervised document structure analysis of digital scientific articles |
48 |
2014 |
| 23 |
Formula rl: Deep reinforcement learning for autonomous racing using telemetry data |
47 |
2021 |
| 24 |
A Literature Survey of Early Time Series Classification and Deep Learning |
46 |
2016 |
| 25 |
Daphne: An open and extensible system infrastructure for integrated data analysis pipelines |
45 |
2022 |
| 26 |
Saga++: A scalable framework for optimizing data cleaning pipelines for machine learning applications |
45 |
2026 |
| 27 |
Should we embed in chemistry? A comparison of unsupervised transfer learning with PCA, UMAP, and VAE on molecular fingerprints |
44 |
2021 |
| 28 |
Polarity classification for target phrases in tweets: a Word2Vec approach |
41 |
2016 |
| 29 |
Theory-inspired machine learningâtowards a synergy between knowledge and data |
41 |
2022 |
| 30 |
Ubiquitous access to digital cultural heritage |
40 |
2017 |
| 31 |
Analysis of structural relationships for hierarchical cluster labeling |
38 |
2010 |
| 32 |
Predicting treatment outcomes using explainable machine learning in children with asthma |
38 |
2021 |
| 33 |
Recommending tags for pictures based on text, visual content and user context |
38 |
2008 |
| 34 |
A comparison of two unsupervised table recognition methods from digital scientific articles |
37 |
2014 |
| 35 |
An unsupervised machine learning approach to body text and table of contents extraction from digital scientific articles |
35 |
2013 |
| 36 |
Efficient linear text segmentation based on information retrieval techniques |
35 |
2009 |
| 37 |
Information extraction from German radiological reports for general clinical text and language understanding |
35 |
2023 |
| 38 |
A historical perspective of biomedical explainable AI research |
34 |
2023 |
| 39 |
Detection of Abusive Speech for Mixed Sociolects of Russian and Ukrainian Languages |
33 |
2018 |
| 40 |
Reconsidering read and spontaneous speech: Causal perspectives on the generation of training data for automatic speech recognition |
33 |
2023 |
| 41 |
QZToolâautomatically generated origin-destination matrices from cell phone trajectories |
32 |
2016 |
| 42 |
Adversarial inter-group link injection degrades the fairness of graph neural networks |
31 |
2022 |
| 43 |
Feature extraction from analog wafermaps: A comparison of classical image processing and a deep generative model |
30 |
2019 |
| 44 |
Using factual density to measure informativeness of web documents |
30 |
2013 |
| 45 |
Deriving public transportation timetables with large-scale cell phone data |
28 |
2015 |
| 46 |
Identifying referenced text in scientific publications by summarisation and classification techniques |
26 |
2016 |
| 47 |
Map-matching cell phone trajectories of low spatial and temporal accuracy |
26 |
2015 |
| 48 |
Structack: Structure-based adversarial attacks on graph neural networks |
25 |
2021 |
| 49 |
Gaussian process surrogates for modeling uncertainties in a use case of forging superalloys |
24 |
2022 |
| 50 |
SAZED: parameter-free domain-agnostic season length estimation in time series data |
24 |
2019 |
| 51 |
A comparison of layout based bibliographic metadata extraction techniques |
23 |
2012 |
| 52 |
Improving the consistency of the failure mode effect analysis (FMEA) documents in semiconductor manufacturing |
23 |
2022 |
| 53 |
Enhancing OCR in historical documents with complex layouts through machine learning |
22 |
2025 |
| 54 |
Knowledge discovery using the KnowMiner framework |
22 |
2009 |
| 55 |
Extraction of references using layout and formatting information from scientific articles |
21 |
2013 |
| 56 |
Predictive capability of QSAR models based on the CompTox zebrafish embryo assays: An imbalanced classification problem |
21 |
2021 |
| 57 |
Towards a More Fine Grained Analysis of Scientific Authorship: Predicting the Number of Authors Using Stylometric Features |
21 |
2016 |
| 58 |
Vote/veto meta-classifier for authorship identification notebook for PAN at CLEF 2011 |
20 |
2011 |
| 59 |
Machine learning techniques for automatically extracting contextual information from scientific publications |
19 |
2015 |
| 60 |
Crowdsourcing fact extraction from scientific literature |
18 |
2013 |
| 61 |
Ensemble machine learning, deep learning, and time series forecasting: improving prediction accuracy for hourly concentrations of ambient air pollutants |
18 |
2024 |
| 62 |
Improving FMEA comprehensibility via common-sense knowledge graph completion techniques |
18 |
2023 |
| 63 |
Towards Authorship Attribution for Bibliometrics using Stylometric Features |
18 |
2015 |
| 64 |
Body mass index, body image dissatisfaction, and eating disorder symptoms in female aquatic sports: Comparison between artistic swimmers and female water polo players |
17 |
2020 |
| 65 |
Extending folksonomies for image tagging |
17 |
2008 |
| 66 |
Improving OCR quality in 19th century historical documents using a combined machine learning based approach |
17 |
2024 |
| 67 |
Know-center at semeval-2019 task 5: multilingual hate speech detection on twitter using cnns |
17 |
2019 |
| 68 |
Exploiting propositions for opinion mining |
16 |
2016 |
| 69 |
Self-and cross-excitation in stack exchange question & answer communities |
16 |
2019 |
| 70 |
A causality-inspired approach for anomaly detection in a water treatment testbed |
15 |
2022 |
| 71 |
A comparison of supervised approaches for process pattern recognition in analog semiconductor wafer test data |
15 |
2018 |
| 72 |
Astro-and geoinformaticsâvisually guided classification of time series data |
15 |
2020 |
| 73 |
Distributed Web2. 0 crawling for ontology evolution |
15 |
2007 |
| 74 |
Lessons learned from the 1st Ariel Machine Learning Challenge: Correcting transiting exoplanet light curves for stellar spots |
15 |
2023 |
| 75 |
Parasitic resistance as a predictor of faulty anodes in electro galvanizing: a comparison of machine learning, physical and hybrid models |
15 |
2020 |
| 76 |
Effective use of BERT in graph embeddings for sparse knowledge graph completion |
14 |
2022 |
| 77 |
GerIE-An Open Information Extraction System for the German Language |
14 |
2018 |
| 78 |
KCDC: Word sense induction by using grammatical dependencies and sentence phrase structure |
14 |
2010 |
| 79 |
Treatment outcome clustering patterns correspond to discrete asthma phenotypes in children |
14 |
2021 |
| 80 |
Reconstructing the logical structure of a scientific publication using machine learning |
13 |
2016 |
| 81 |
Unleashing semantics of research data |
13 |
2012 |
| 82 |
Chatbots assisting German business management applications |
12 |
2019 |
| 83 |
A generative semi-supervised classifier for datasets with unknown classes |
11 |
2020 |
| 84 |
Is enterprise search useful at all? Lessons learned from studying user behavior |
11 |
2014 |
| 85 |
Cluster purging: Efficient outlier detection based on rate-distortion theory |
10 |
2021 |
| 86 |
Detecting outliers in non-iid data: A systematic literature review |
10 |
2023 |
| 87 |
Driver's dashboardâusing social media data as additional information for motorway operators |
10 |
2018 |
| 88 |
Exploration of transfer learning techniques for the prediction of PM |
10 |
2025 |
| 89 |
Extending Scientific Literature Search by Including the Author's Writing Style |
10 |
2017 |
| 90 |
Activity archetypes in question-and-answer (q8a) websitesâa study of 50 stack exchange instances |
9 |
2019 |
| 91 |
An embedding approach for microblog polarity classification |
9 |
2017 |
| 92 |
citation needed: Filling in Wikipedia's Citation Shaped Holes |
9 |
2014 |
| 93 |
Large language models for fault detection in buildingsâ HVAC systems |
9 |
2024 |
| 94 |
Long short-term memory networks for enhancing real-time flood forecasts: a case study for an underperforming hydrologic model |
9 |
2025 |
| 95 |
Recommending scientific literature: Comparing use-cases and algorithms |
9 |
2014 |
| 96 |
Text representation for efficient document annotation |
9 |
2013 |
| 97 |
Vote/Veto Classification, Ensemble Clustering and Sequence Classification for Author Identification |
9 |
2012 |
| 98 |
AI-Based Knowledge Management System for Risk Assessment and Root Cause Analysis in Semiconductor Industry |
8 |
2022 |
| 99 |
Ein industrie 4.0-use case in der motorenproduktion |
8 |
2018 |
| 100 |
Ensemble methods |
8 |
2016 |
| 101 |
Large language models for electronic health record de-identification in english and german |
8 |
2025 |
| 102 |
The interplay between communities and homophily in semi-supervised classification using graph neural networks |
8 |
2021 |
| 103 |
Using the open meta kaggle dataset to evaluate tripartite recommendations in data markets |
8 |
2019 |
| 104 |
Vote/veto classification, ensemble clustering and sequence classification for author identification-Notebook of PAN at CLEF 2012 |
8 |
2012 |
| 105 |
Exploring the capabilities of gpt4-vision as ocr engine |
7 |
2024 |
| 106 |
Exploring the influence of tagging motivation on tagging behavior |
7 |
2010 |
| 107 |
Graz University of Technology at CL-SciSumm 2017: Query Generation Strategies |
7 |
2017 |
| 108 |
Interpretability of causal discovery in tracking deterioration in a highly dynamic process |
7 |
2024 |
| 109 |
KnowMiner: Ein service orientiertes knowledge discovery framework |
7 |
2006 |
| 110 |
ManEx: The visual analysis of measurements for the assessment of errors in electrical engines |
7 |
2022 |
| 111 |
Privacy in open search: A review of challenges and solutions |
7 |
2021 |
| 112 |
Solving multi-objective inverse problems of chained manufacturing processes |
7 |
2023 |
| 113 |
Stylometric watermarks for large language models |
7 |
2024 |
| 114 |
Understanding wafer patterns in semiconductor production with variational auto-encoders |
7 |
2018 |
| 115 |
A health factor for process patterns enhancing semiconductor manufacturing by pattern recognition in analog wafermaps |
6 |
2019 |
| 116 |
An Information Retrieval Based Approach for Multilingual Ontology Matching |
6 |
2016 |
| 117 |
A study of scientific writing: Comparing theoretical guidelines with practical implementation |
6 |
2014 |
| 118 |
Enhanced Active Learning of Convolutional Neural Networks: A Case Study for Defect Classification in the Semiconductor Industry |
6 |
2020 |
| 119 |
Know-Center at PAN 2015 author identification |
6 |
2015 |
| 120 |
Markov random fields for pattern extraction in analog wafer test data |
6 |
2017 |
| 121 |
Model selection strategies for author disambiguation |
6 |
2011 |
| 122 |
Source selection of long tail sources for federated search in an uncooperative setting |
6 |
2018 |
| 123 |
Using ontologies for software documentation |
6 |
2009 |
| 124 |
Addressing hallucination in causal q&a: The efficacy of fine-tuning over prompting in llms |
5 |
2025 |
| 125 |
Crosslanguage retrieval based on wikipedia statistics |
5 |
2008 |
| 126 |
Effects of class imbalance countermeasures on interpretability |
5 |
2024 |
| 127 |
Evaluation of pseudo relevance feedback techniques for cross vertical aggregated search |
5 |
2015 |
| 128 |
German encyclopedia alignment based on information retrieval techniques |
5 |
2010 |
| 129 |
Grammar Checker Features for Author Identification and Author Profiling |
5 |
2013 |
| 130 |
Impact of training instance selection on domain-specific entity extraction using BERT |
5 |
2022 |
| 131 |
KnCe2013-CORE: Semantic Text Similarity by use of Knowledge Bases |
5 |
2013 |
| 132 |
Opinion mining with a clause-based approach |
5 |
2017 |
| 133 |
PyChemFlow: an automated pre-processing pipeline in Python for reproducible machine learning on chemical data |
5 |
2023 |
| 134 |
A formally robust time series distance metric |
4 |
2020 |
| 135 |
A semantic federated search engine for domain-specific document retrieval |
4 |
2017 |
| 136 |
Cyber-Physical Systems as Enablers in Manufacturing Communication andWorker Support |
4 |
2019 |
| 137 |
Efficient table annotation for digital articles |
4 |
2015 |
| 138 |
Ensemble watermarks for large language models |
4 |
2025 |
| 139 |
Flexible scheduling for human robot collaboration in intralogistics teams |
4 |
2018 |
| 140 |
KOMPOS: Connecting causal knots in large nonlinear time series with non-parametric regression splines |
4 |
2021 |
| 141 |
On the impact of communities on semi-supervised classification using graph neural networks |
4 |
2020 |
| 142 |
Profiling microblog authors using concreteness and sentiment |
4 |
2016 |
| 143 |
Towards a marketplace for the scientific community: accessing knowledge from the computer science domain |
4 |
2014 |