Přejít k hlavnímu obsahu
Header image

PhD Club lecture: ChemPatentizer: Transforming Chemical Patents into Actionable Scientific Data by Riccardo Fusco

Dear colleagues,

The PhD club season will start one week earlier, opening with a talk by Riccardo Fusco called: "ChemPatentizer: Transforming Chemical Patents into Actionable Scientific Data" on Wednesday February 26, 2025 at 9 AM in room 1.09.

Abstract
Chemical patents are a rich yet challenging source of structure and activity data, with significant obstacles arising from their format and lack of standardization. Often, these documents are presented as scanned images, making data extraction through classical OCR methods unreliable. Furthermore, unstructured activity data and nonstandard layouts compound the difficulty, as identifiers are frequently poor in quality and patents can exceed hundreds of pages. Nevertheless, the potential for highquality data within chemical patents, particularly through standardized assays and significant matched pair transformations, underscores the need for an effective extraction solution. In response, we present a semi-autonomous pipeline, ChemPatentizer, that combines human expertise with DECIMER, a tool for recognizing chemical structures, to segment and convert patent content into a usable format. Unlike fully automated approaches, our pipeline leverages chemist input to guide patent selection and initial data segmentation, addressing the nuanced and varied nature of patent data. Following manual segmentation, our pipeline automates the creation of structure-activity tables and facilitates downstream analyses, such as molecular matched pair studies and Deep QSAR modeling. ChemPatentizer’s modular design allows human verification at each step, enhancing accuracy and reliability. Our approach, validated with patents on the GLP-1 receptor, demonstrates the value of combining automation with expert oversight in extracting meaningful data from chemical patents.

 

We're looking forward to seeing you!