RESIN: Schema-guided Multi-document Event Extraction, Tracking, Prediction, and Visualization for News Articles

Contact: Zixuan Zhang (zixuan11@illinois.edu), Khanh Duy Nguyen (knguye71@illinois.edu), Heng Ji (hengji@illinois.edu)

Please email Zixuan Zhang if you experience any technical issues using our software or need further information.


About

Our RESIN system is an integrated event extraction, tracking, prediction, and analysis pipeline for news articles. The main goal of RESIN is to output a comprehensive representation and analysis of a news event, including event detection, event argument extraction, event and entity coreference resolution, event temporal ordering, and schema-based future event prediction. The input of RESIN is a cluster of news articles (related to one major news event), and the output is a heterogeneous event graph that contains all the extracted events, entities, relations, and their temporal orderings. The event graph can be used to answer various questions about the event, such as “what happened”, “who did what”, “when did it happen”, and “what will happen next”. An example output of a complex event related to disease outbreak is presented with our own visualizer.

Software

All code and resources for RESIN can be accessed and downloaded at https://github.com/blender-nlp/RESIN.


Installation

All components in the RESIN pipeline are dockerized and stored on Dockerhub. To install RESIN, please follow the instructions below to pull all docker images and start-and-compose all docker containers.

Install docker and docker-compose.

Please follow the official instructions for installing docker and docker-compose on your machine.

Create and configure the Python environment.

Run the following commands to create a conda environment and install dependencies:
conda create -n resin python=3.7
pip install -r requirements.txt

Start the pipeline and run with your own data.

Data preprocessing.

As required by DARPA KAIROS evaluation, the input document clusters should be represented in LDC format. An example piece of data (two document clusters) in this format are available at this [link]. We also provide easy-to-use python scripts to transform a document cluster from a much cleaner JSON format to the LDC format.

How to transform a document cluster into the LDC format?

You can go to the preprocess folder and run the following commands to get an LDC formatted dataset:
python transform_format.py --input_file [PATH_TO_A_DOC_CLUSTER] --name [CLUSTER_NAME] --output_dir [OUTPUT_DIR]
Here, the input document clusters should be formatted in JSON, where each key is the ID of the document and each value is the document text. An example is:
{
    "doc_1": "hello world!",
    "doc_2": "This is a test document cluster."
}
For the other two command-line arguments, you can use any name you want for your CLUSTER_NAME and the preprocessed dataset is generated under OUTPUT_DIR.

Setup the APIs and run.

Please follow the readme file at https://github.com/blender-nlp/RESIN.

Visualizing and analyzing the results.

We have also developed RESIN-Editor for users to interactively visualize and edit the results. After running the pipeline, you can go to https://blender-nlp.github.io/RESIN-Editor/ to analyze the results. A demonstration video for this editor is at https://www.youtube.com/watch?v=fmW-GwPMrw0