This page provides companion data for the paper submitted to SANER'2018 - RENE Track. This data is intended to make our results reproducible.
Rodrigo Fernandes Gomes da Silva
Klérisson Vinícius Ribeiro Paixão
Marcelo de Almeida Maia
We provide two dumps, both containing the main tables. They differ only in the table "posts". In Dump 1, the table data is stemmed and had the stop words removed. Also it has the synonyms of tags and code blocks already extracted. In Dump 2, the table contains the original raw content. The fastest way to reproduce DupPredictor or Dupe is using Dump 1. If you desire to run the entire process, including the stemming and stop words removal, follow the instructions available in the preprocess step for stemming and removing the stop words.