🐙 Tako Data Challenge

This edition of MAINT includes a dedicated Data Challenge which uses interaction histories recorded with Tako, an extension developed by CodeLounge.

Currently, Tako supports any language the features of which are exposed through the Language Server Protocol (e.g., Scala through Metals), plus Typescript/Javascript. We plan to extend the set of supported languages in future versions of Tako - including Java and Python.

Dataset and Model

For this challenge, we are sharing two weeks of development of three different developers, and they were all using Typescript. The data is composed of a set of ndjson files which contain lists of events with a timestamp. Each event models a call from language feature providers as listed here.

For example, an event may track call of provideDocumentSymbols from DocumentSymbolProvider. The event will include a serialized version of a document, and a list of the returned DocumentSymbol objects which represent the symbols in the document, i.e., all the classes, methods, variables, etc. at that specific timestamp.

Given the cancellation of the workshop, we removed the dataset. If you are interested on this data for your own research, please contact the organizers.

Questions

Possible submissions for the data challenge include, but are not limited to, trying to answer the following questions:

  • Are there periods where developers are highly focused on a task?
  • Are there long pauses that impact productivity after resume?
  • Can you find refactorings of any form?
  • How much code has been written and then deleted during a session?
  • Which type of symbols (classes, methods, variables, etc.) have been introduced, or navigated, in a session?
  • How complex - in terms of number of symbols - were the files manipulated during sessions?
  • How were symbols named? How many times programmers changed their minds while naming a new class or symbol?
  • What are the dynamics related to symbol evolution in a development session?
  • What are the dynamics related to errors, warnings, hints in a development session?
To answer these questions, even anecdotally, you can use any data analytics approach that you like, including visualizations. If possible, make your scripts available, so that in the workshop we will try to re-run them on more development data that we will collect in the meanwhile.