Leveraging Named Entity Recognition and Information Extraction to Support Human Trafficking Detection through Online Content Analysis
14 July 2025
Named Entity Recognition (NER) and Relation Extraction (RE) can constitute powerful tools in detecting and investigating human trafficking from online sources, such as websites of interest and social media platforms, through the extraction of structured information from unstructured text, including for instance the identification of persons and locations of interest, or the uncovering of links between persons of interest such as recruiters and victims.
CEA, partner in the VANGUARD project, explored ways to apply these methods to assist police and border guard authorities in their effort to detect human trafficking activities more effectively and efficiently. Their work was also evaluated in the context of widely known scientific challenges and in particular in TextMine’25, organised by Airbus Defence and Space, and EvalLLM’25, co-organised by the DGA and AMIAD. There, the focus has been on processing French-language texts in specialised domains, providing an opportunity to evaluate various automated analysis methods on complex information extraction tasks.
The extraction of named entities and events can be challenging for several reasons, including the limited availability of ground truth data specifically tailored to the domains of interest. Our solution has therefore focused on a few-shot learning settings. In brief, a pipeline strategy was developed where an encoder-based model was used for entity recognition, followed by an encoder-decoder model for events. Events were modelled as relations between entities and then reconstructed accordingly.
In the context of human trafficking detection from online sources, meaningful links between entities, as for instance between a recruiter, victim, and location, can span across different sentences. Thus, for relation extraction at the document level, the focus has been on models capable of handling dispersed and context-dependent information. From a technical standpoint, several encoder-based approaches were tested, including one variant that integrates a graph structure to model interdependencies between relations, and another that incorporates a proof mechanism to justify the extracted relations.
Overall, the experiments conducted highlight the relevance of targeted, lightweight architectures for tackling complex and weakly supervised tasks in sensitive domains, such as human trafficking. By integrating the above mentioned methods to the VANGUARD solution, the timely identification of entities of interest and their potential relations can further enhance police and border guard authorities’ capabilities in identifying critical information potentially related to human trafficking activities.