Citations for Roman Kern

Pos	Paper	Citations	Year
1	PySpark and RDKit: moving towards big data in cheminformatics	139	2019
2	How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing	127	2023
3	Machine learning in continuous casting of steel: A state-of-the-art survey	126	2022
4	External and intrinsic plagiarism detection using vector space models	118	2009
5	Understanding the true effects of the COVID-19 lockdown on air pollution by means of machine learning	116	2021
6	Why do users tag? Detecting usersâ motivation for tagging in social tagging systems	96	2010
7	Machine learning in prediction of intrinsic aqueous solubility of drugâlike compounds: Generalization, complexity, or predictive ability?	90	2021
8	Of categorizers and describers: An evaluation of quantitative measures for tagging motivation	88	2010
9	Recent advances of differential privacy in centralized deep learning: A systematic survey	84	2025
10	Understanding why users tag: A survey of tagging motivation literature and results from an empirical study	71	2012
11	Authorship identification of documents with high content similarity	63	2018
12	External and intrinsic plagiarism detection using a cross-lingual retrieval and segmentation system	62	2010
13	Evaluation of folksonomy induction algorithms	55	2012
14	Mesh-free surrogate models for structural mechanic FEM simulation: A comparative study of approaches	55	2021
15	Aspects of broad folksonomies	54	2007
16	Deep learningâa first meta-survey of selected reviews across scientific disciplines, their commonalities, challenges and research impact	53	2021
17	Establishing and evaluating trustworthy AI: overview and research challenges	52	2024
18	Big data as a promoter of industry 4.0: Lessons of the semiconductor industry	49	2017
19	Constructing robust health indicators from complex engineered systems via anticausal learning	49	2022
20	Assessing trustworthy AI: Technical and legal perspectives of fairness in AI	48	2024
21	Teambeam-meta-data extraction from scientific literature	48	2012
22	Unsupervised document structure analysis of digital scientific articles	48	2014
23	Formula rl: Deep reinforcement learning for autonomous racing using telemetry data	47	2021
24	A Literature Survey of Early Time Series Classification and Deep Learning	46	2016
25	Daphne: An open and extensible system infrastructure for integrated data analysis pipelines	45	2022
26	Saga++: A scalable framework for optimizing data cleaning pipelines for machine learning applications	45	2026
27	Should we embed in chemistry? A comparison of unsupervised transfer learning with PCA, UMAP, and VAE on molecular fingerprints	44	2021
28	Polarity classification for target phrases in tweets: a Word2Vec approach	41	2016
29	Theory-inspired machine learningâtowards a synergy between knowledge and data	41	2022
30	Ubiquitous access to digital cultural heritage	40	2017
31	Analysis of structural relationships for hierarchical cluster labeling	38	2010
32	Predicting treatment outcomes using explainable machine learning in children with asthma	38	2021
33	Recommending tags for pictures based on text, visual content and user context	38	2008
34	A comparison of two unsupervised table recognition methods from digital scientific articles	37	2014
35	An unsupervised machine learning approach to body text and table of contents extraction from digital scientific articles	35	2013
36	Efficient linear text segmentation based on information retrieval techniques	35	2009
37	Information extraction from German radiological reports for general clinical text and language understanding	35	2023
38	A historical perspective of biomedical explainable AI research	34	2023
39	Detection of Abusive Speech for Mixed Sociolects of Russian and Ukrainian Languages	33	2018
40	Reconsidering read and spontaneous speech: Causal perspectives on the generation of training data for automatic speech recognition	33	2023
41	QZToolâautomatically generated origin-destination matrices from cell phone trajectories	32	2016
42	Adversarial inter-group link injection degrades the fairness of graph neural networks	31	2022
43	Feature extraction from analog wafermaps: A comparison of classical image processing and a deep generative model	30	2019
44	Using factual density to measure informativeness of web documents	30	2013
45	Deriving public transportation timetables with large-scale cell phone data	28	2015
46	Identifying referenced text in scientific publications by summarisation and classification techniques	26	2016
47	Map-matching cell phone trajectories of low spatial and temporal accuracy	26	2015
48	Structack: Structure-based adversarial attacks on graph neural networks	25	2021
49	Gaussian process surrogates for modeling uncertainties in a use case of forging superalloys	24	2022
50	SAZED: parameter-free domain-agnostic season length estimation in time series data	24	2019
51	A comparison of layout based bibliographic metadata extraction techniques	23	2012
52	Improving the consistency of the failure mode effect analysis (FMEA) documents in semiconductor manufacturing	23	2022
53	Enhancing OCR in historical documents with complex layouts through machine learning	22	2025
54	Knowledge discovery using the KnowMiner framework	22	2009
55	Extraction of references using layout and formatting information from scientific articles	21	2013
56	Predictive capability of QSAR models based on the CompTox zebrafish embryo assays: An imbalanced classification problem	21	2021
57	Towards a More Fine Grained Analysis of Scientific Authorship: Predicting the Number of Authors Using Stylometric Features	21	2016
58	Vote/veto meta-classifier for authorship identification notebook for PAN at CLEF 2011	20	2011
59	Machine learning techniques for automatically extracting contextual information from scientific publications	19	2015
60	Crowdsourcing fact extraction from scientific literature	18	2013
61	Ensemble machine learning, deep learning, and time series forecasting: improving prediction accuracy for hourly concentrations of ambient air pollutants	18	2024
62	Improving FMEA comprehensibility via common-sense knowledge graph completion techniques	18	2023
63	Towards Authorship Attribution for Bibliometrics using Stylometric Features	18	2015
64	Body mass index, body image dissatisfaction, and eating disorder symptoms in female aquatic sports: Comparison between artistic swimmers and female water polo players	17	2020
65	Extending folksonomies for image tagging	17	2008
66	Improving OCR quality in 19th century historical documents using a combined machine learning based approach	17	2024
67	Know-center at semeval-2019 task 5: multilingual hate speech detection on twitter using cnns	17	2019
68	Exploiting propositions for opinion mining	16	2016
69	Self-and cross-excitation in stack exchange question & answer communities	16	2019
70	A causality-inspired approach for anomaly detection in a water treatment testbed	15	2022
71	A comparison of supervised approaches for process pattern recognition in analog semiconductor wafer test data	15	2018
72	Astro-and geoinformaticsâvisually guided classification of time series data	15	2020
73	Distributed Web2. 0 crawling for ontology evolution	15	2007
74	Lessons learned from the 1st Ariel Machine Learning Challenge: Correcting transiting exoplanet light curves for stellar spots	15	2023
75	Parasitic resistance as a predictor of faulty anodes in electro galvanizing: a comparison of machine learning, physical and hybrid models	15	2020
76	Effective use of BERT in graph embeddings for sparse knowledge graph completion	14	2022
77	GerIE-An Open Information Extraction System for the German Language	14	2018
78	KCDC: Word sense induction by using grammatical dependencies and sentence phrase structure	14	2010
79	Treatment outcome clustering patterns correspond to discrete asthma phenotypes in children	14	2021
80	Reconstructing the logical structure of a scientific publication using machine learning	13	2016
81	Unleashing semantics of research data	13	2012
82	Chatbots assisting German business management applications	12	2019
83	A generative semi-supervised classifier for datasets with unknown classes	11	2020
84	Is enterprise search useful at all? Lessons learned from studying user behavior	11	2014
85	Cluster purging: Efficient outlier detection based on rate-distortion theory	10	2021
86	Detecting outliers in non-iid data: A systematic literature review	10	2023
87	Driver's dashboardâusing social media data as additional information for motorway operators	10	2018
88	Exploration of transfer learning techniques for the prediction of PM	10	2025
89	Extending Scientific Literature Search by Including the Author's Writing Style	10	2017
90	Activity archetypes in question-and-answer (q8a) websitesâa study of 50 stack exchange instances	9	2019
91	An embedding approach for microblog polarity classification	9	2017
92	citation needed: Filling in Wikipedia's Citation Shaped Holes	9	2014
93	Large language models for fault detection in buildingsâ HVAC systems	9	2024
94	Long short-term memory networks for enhancing real-time flood forecasts: a case study for an underperforming hydrologic model	9	2025
95	Recommending scientific literature: Comparing use-cases and algorithms	9	2014
96	Text representation for efficient document annotation	9	2013
97	Vote/Veto Classification, Ensemble Clustering and Sequence Classification for Author Identification	9	2012
98	AI-Based Knowledge Management System for Risk Assessment and Root Cause Analysis in Semiconductor Industry	8	2022
99	Ein industrie 4.0-use case in der motorenproduktion	8	2018
100	Ensemble methods	8	2016
101	Large language models for electronic health record de-identification in english and german	8	2025
102	The interplay between communities and homophily in semi-supervised classification using graph neural networks	8	2021
103	Using the open meta kaggle dataset to evaluate tripartite recommendations in data markets	8	2019
104	Vote/veto classification, ensemble clustering and sequence classification for author identification-Notebook of PAN at CLEF 2012	8	2012
105	Exploring the capabilities of gpt4-vision as ocr engine	7	2024
106	Exploring the influence of tagging motivation on tagging behavior	7	2010
107	Graz University of Technology at CL-SciSumm 2017: Query Generation Strategies	7	2017
108	Interpretability of causal discovery in tracking deterioration in a highly dynamic process	7	2024
109	KnowMiner: Ein service orientiertes knowledge discovery framework	7	2006
110	ManEx: The visual analysis of measurements for the assessment of errors in electrical engines	7	2022
111	Privacy in open search: A review of challenges and solutions	7	2021
112	Solving multi-objective inverse problems of chained manufacturing processes	7	2023
113	Stylometric watermarks for large language models	7	2024
114	Understanding wafer patterns in semiconductor production with variational auto-encoders	7	2018
115	A health factor for process patterns enhancing semiconductor manufacturing by pattern recognition in analog wafermaps	6	2019
116	An Information Retrieval Based Approach for Multilingual Ontology Matching	6	2016
117	A study of scientific writing: Comparing theoretical guidelines with practical implementation	6	2014
118	Enhanced Active Learning of Convolutional Neural Networks: A Case Study for Defect Classification in the Semiconductor Industry	6	2020
119	Know-Center at PAN 2015 author identification	6	2015
120	Markov random fields for pattern extraction in analog wafer test data	6	2017
121	Model selection strategies for author disambiguation	6	2011
122	Source selection of long tail sources for federated search in an uncooperative setting	6	2018
123	Using ontologies for software documentation	6	2009
124	Addressing hallucination in causal q&a: The efficacy of fine-tuning over prompting in llms	5	2025
125	Crosslanguage retrieval based on wikipedia statistics	5	2008
126	Effects of class imbalance countermeasures on interpretability	5	2024
127	Evaluation of pseudo relevance feedback techniques for cross vertical aggregated search	5	2015
128	German encyclopedia alignment based on information retrieval techniques	5	2010
129	Grammar Checker Features for Author Identification and Author Profiling	5	2013
130	Impact of training instance selection on domain-specific entity extraction using BERT	5	2022
131	KnCe2013-CORE: Semantic Text Similarity by use of Knowledge Bases	5	2013
132	Opinion mining with a clause-based approach	5	2017
133	PyChemFlow: an automated pre-processing pipeline in Python for reproducible machine learning on chemical data	5	2023
134	A formally robust time series distance metric	4	2020
135	A semantic federated search engine for domain-specific document retrieval	4	2017
136	Cyber-Physical Systems as Enablers in Manufacturing Communication andWorker Support	4	2019
137	Efficient table annotation for digital articles	4	2015
138	Ensemble watermarks for large language models	4	2025
139	Flexible scheduling for human robot collaboration in intralogistics teams	4	2018
140	KOMPOS: Connecting causal knots in large nonlinear time series with non-parametric regression splines	4	2021
141	On the impact of communities on semi-supervised classification using graph neural networks	4	2020
142	Profiling microblog authors using concreteness and sentiment	4	2016
143	Towards a marketplace for the scientific community: accessing knowledge from the computer science domain	4	2014