Saturday, May 6, 2017

Weeks 30 & 31: 4/19/2017 - 5/3/2017

Work Accomplished

These past two weeks have been extremely hectic, because of YSU's last week of classes and finals. I worked on interpreting Alyssa's results and writing our final report. We found that novice developers assign less tags than expert developers. They as spend more time on text and titles as task complexity increases to assign tags. Expert developers tend to not only assign more tags, but they also spend more time on code when task complexity increases. They also more accurately assign tags than novice developers. These trends are important for informing tag prediction models, because expert developers have distinct trends from novices and more accurately predict tags. Therefore, using expert eye gaze pattern features on Stack Overflow documents to predict tags may prove lucrative.

I also finished my senior capstone report and presented the work. This work was an extension of the research from last year's CREU project. We explored the same eye-tracking dataset using sequential analysis techniques to study/predict developer expertise with different measures. These techniques also put emphasis on the order in which developers navigate code. This work is a more mature version of my QUEST poster and the paper we submitted to MSR a few months back. Here is a picture from my presentation:


Finally, I graduated Summa Cum Laude with a BS in mathematics and computer science from Youngstown State University today. Here is a picture with Dr. Bonita Sharif and Dr. Alina Lazar:

Outcome(s)

  1. Finished the final report for CREU
  2. Presented my capstone project extending the eye-tracking research we performed in last year's CREU project.
  3. Graduated!!

Thursday, April 27, 2017

Week 30: 4/22/2017 - 4/26/2017

Work Accomplished: 
This week, I printed out the data from Databricks after running the small xml file through Apache Spark. I am in the process of writing the abstract. Next week, I will be presenting this project to the YSU CSIS department as part of my senior capstone. I wrote the draft of our final report for the CREU report for Jenna to edit. 

Outcomes
Finished final report, finished senior project and finished the study as a whole. 

Thursday, April 20, 2017

Week 29: 4/12/2017 - 4/19/2017

Work Accomplished:
This week, Jenna helped me to import the Kaggle data into Databricks to run Apache Spark on to train the data. It was challenging to set up a cluster and get the data to import correctly into each column. The next step is to run the bigger data set and test it so that we can compare against the data sets. 

Weekly goal: Finish the dataset testing. 

Future goal: Finish testing the datasets and use the keyword results to inform our tags so that we might be able to predict tags on Stack Overflow. We also need to submit our final paper for this project by May 5th

Wednesday, April 19, 2017

Week 29: 4/12/2017 - 4/19/2017

Work Accomplished

This past week, I worked on using DataBricks to run Apache Spark on a small dataset from the Kaggle competition with Alyssa. Alyssa and I worked together to move past the trouble we had last week and complete running Spark on the small dataset. I also presented a short summary of our paper, On Predicting Developer Expertise from Eye Gazes for Bug Fixing Tasks, that was accepted in the YSU Honors College Journal at the kick-off luncheon for the journal. This paper was about the results from last year's CREU project.


Goal

Weekly goal(s) - In the next week, I will finish up the analyses we have left for the keyword predictions and start writing our end of the year report.
Long-term goal(s) - Predict keywords by modifying the process of a propose method for keyword prediction in a Kaggle competition by incorporating eye-tracking. We will also predict keywords without eye-tracking and compare the two keyword sets generated. These keywords inform our tags, so determining them will tell us which pieces of code and/or text in a StackOverflow document are pertinent to tag selection.

Outcome(s)

  1. Finished running Spark jobs in DataBricks on the Kaggle competition data
  2. Presented results from a paper we wrote during last year's CREU project at an event for the journal it was accepted at

Thursday, April 13, 2017

Week 28: 4/5/2017 - 4/12/2017

Work Accomplished

This past week, I worked on using DataBricks to run Apache Spark on a small dataset from the Kaggle competition with Alyssa. Alyssa and I worked together to walk through the DataBricks User Guide (https://docs.databricks.com/user-guide/index.html). We were able to get the small dataset loaded into DataBricks and start up a cluster to run our Spark jobs on it. We ran into some trouble, because the Community Editions of DataBricks doesn't seem to allow us to run jobs on our cluster, so we are working through this issue during this next week. There may be a way around this by using a specific type of cluster.


Goal

Weekly goal(s) - In the next week, I will be using DataBricks to run Apache Spark on a small dataset from the Kaggle competition with Alyssa by figuring out how to run jobs on a cluster in DataBricks. This will get me introduced to using DataBricks for keyword prediction, so that we may use it on our data.
Long-term goal(s) - Predict keywords by modifying the process of a propose method for keyword prediction in a Kaggle competition by incorporating eye-tracking. We will also predict keywords without eye-tracking and compare the two keyword sets generated. These keywords inform our tags, so determining them will tell us which pieces of code and/or text in a StackOverflow document are pertinent to tag selection.

Outcome(s)


  1. Made progress on running Spark jobs in DataBricks by loading Kaggle competition data into it and reading over a user guide.

Wednesday, April 12, 2017

Week 28: 4/5/2017 - 4/12/2017

Work Accomplished: 
This week, Jenna helped me to upload our training data into DataBricks, the online platform for running Apache Spark. She was able to solve the issue of our file upload. The Train data from the Kaggle Competition is now running. Jenna and I are going to figure out how to run it through a cluster. I am going to start writing the draft of our paper and gather the rest of the data for analysis.


Weekly Goal: Get the rough draft of our paper written.


Future Goal: Complete data analysis and machine learning. Submit our paper to be published. 

Thursday, April 6, 2017

Week 27: 3/29/17 - 4/5/2017

Work Accomplished

This past week I presented my poster at QUEST (see the image below); a conference hosted at our university for our university's students to present their research. I also checked over Alyssa's generated keywords and I generated my own keywords for our without eye-tracking keyword predictions. It was difficult to think of keywords that might be important for tag prediction, but that Alyssa hadn't already thought of before. Finally, I ran my pre-processing scripts (which I combined into one script) on our collected eye-tracking data.

Goal

Weekly goal(s) - In the next week, I will be using DataBricks to run Apache Spark on a small dataset from the Kaggle competition. This will get me introduced to using DataBricks for keyword prediction, so that we may use it on our data.
Long-term goal(s) - Predict keywords by modifying the process of a propose method for keyword prediction in a Kaggle competition by incorporating eye-tracking. We will also predict keywords without eye-tracking and compare the two keyword sets generated. These keywords inform our tags, so determining them will tell us which pieces of code and/or text in a StackOverflow document are pertinent to tag selection.

Outcome(s)

  1. QUEST poster presentation completed
  2. Checked over Alyssa's keywords and added my own
  3. Ran pre-processing script on collected data