Allamanis, Miltiadis, Earl T. Barr, Premkumar Devanbu, and Charles Sutton. 2018.
“A Survey of Machine Learning for Big Code and Naturalness.” ACM Computing Surveys 51 (4): 81:1–37.
https://doi.org/10.1145/3212695.
Austin, Jacob, Augustus Odena, Maxwell Nye, et al. 2021. “Program Synthesis with Large Language Models.” arXiv Preprint arXiv:2108.07732.
Chen, Mark, Jerry Tworek, Heewoo Jun, et al. 2021. “Evaluating Large Language Models Trained on Code.” arXiv Preprint arXiv:2107.03374.
Feinerer, Ingo, and Kurt Hornik. 2015.
Tm: Text Mining Package.
http://CRAN.R-project.org/package=tm.
Feinerer, Ingo, Kurt Hornik, and David Meyer. 2008.
“Text Mining Infrastructure in r.” Journal of Statistical Software 25 (5): 1–54.
http://www.jstatsoft.org/v25/i05/.
Feng, Zhangyin, Daya Guo, Duyu Tang, et al. 2020.
“CodeBERT: A Pre-Trained Model for Programming and Natural Languages.” Findings of the Association for Computational Linguistics: EMNLP 2020, 1536–47.
https://doi.org/10.18653/v1/2020.findings-emnlp.139.
Jimenez, Carlos E., John Yang, Alexander Wettig, et al. 2024. “SWE-Bench: Can Language Models Resolve Real-World GitHub Issues?” Proceedings of the International Conference on Learning Representations (ICLR 2024).
Lin, Chin-Yew. 2004. “ROUGE: A Package for Automatic Evaluation of Summaries.” Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, 74–81.
Luo, Qingzhou, Farah Hariri, Lamyaa Eloussi, and Darko Marinov. 2014.
“An Empirical Analysis of Flaky Tests.” Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE), 643–53.
https://doi.org/10.1145/2635868.2635920.
Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” Advances in Neural Information Processing Systems (NeurIPS) 26.
Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002.
“BLEU: A Method for Automatic Evaluation of Machine Translation.” Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 311–18.
https://doi.org/10.3115/1073083.1073135.
Ren, Shuo, Daya Guo, Shuai Lu, et al. 2020. “CodeBLEU: A Method for Automatic Evaluation of Code Synthesis.” arXiv Preprint arXiv:2009.10297.
Runeson, Per, Magnus Alexandersson, and Oskar Nyholm. 2007.
“Detection of Duplicate Defect Reports Using Natural Language Processing.” Proceedings of the 29th International Conference on Software Engineering (ICSE), 499–510.
https://doi.org/10.1109/ICSE.2007.32.
Wang, Yue, Weishi Wang, Shafiq Joty, and Steven C. H. Hoi. 2021.
“CodeT5: Identifier-Aware Unified Pre-Trained Encoder-Decoder Models for Code Understanding and Generation.” Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), 8696–708.
https://doi.org/10.18653/v1/2021.emnlp-main.685.
Zhang, Tianyi, Varsha Kishore, Felix Wu, Kilian Q. Weinberger, and Yoav Artzi. 2020. “BERTScore: Evaluating Text Generation with BERT.” Proceedings of the International Conference on Learning Representations (ICLR 2020).