Laboratory of Software Comprehension, Analytics and Mining

Duplicate Question Detection in Stack Overflow: A Reproducibility Study

This page provides companion data for the paper submitted to SANER'2018 - RENE Track. This data is intended to make our results reproducible.

Authors

Rodrigo Fernandes Gomes da Silva
Klérisson Vinícius Ribeiro Paixão
Marcelo de Almeida Maia

Source Code

We make publicly available the source code for our reproductions of DupPredictor and Dupe called DupPredictorRep and DupeRep. Follow the above links for detailed steps of the reproduction. 

Dataset

We provide two dumps, both containing the main tables. They differ only in the table "posts". In Dump 1, the table data is stemmed and had the stop words removed. Also it has the synonyms of tags and code blocks already extracted. In Dump 2the table contains the original raw content. The fastest way to reproduce DupPredictor or Dupe is using Dump 1. If you desire to run the entire process, including the stemming and stop words removal, follow the instructions available in the preprocess step for stemming and removing the stop words.

 

 


Free Global Counter Flag Counter
« prev  |   top  |   next »

Bad Smells: Which, When, What, Who, Where

This most comprehensive systematic literature review ever on bad smells includes 351 papers ranging from 1992 to 2017. We show the prevalence of smells in studies, the chronology, the main findings, the shape of collaborations, challenges and much ...more

Bears - A Bug Benchmark for Automatic Repair Studies

The Bears-Benchmark, or just Bears, is a benchmark of bugs collected from Java open-source projects hosted on GitHub through a process that scans pairs of builds from Travis Continuous Integration and reproduces bugs (by test failure) and their patches (passing test suite). ...more

Defects4J Dissection

Defects4J Dissection presents data to help researchers and practitioners to better understand the Defects4J bug dataset. ...more

Powered by CMSimple | Template by CMSimple! | Login