Saturday, May 6, 2017

Weeks 30 & 31: 4/19/2017 - 5/3/2017

Work Accomplished

These past two weeks have been extremely hectic, because of YSU's last week of classes and finals. I worked on interpreting Alyssa's results and writing our final report. We found that novice developers assign less tags than expert developers. They as spend more time on text and titles as task complexity increases to assign tags. Expert developers tend to not only assign more tags, but they also spend more time on code when task complexity increases. They also more accurately assign tags than novice developers. These trends are important for informing tag prediction models, because expert developers have distinct trends from novices and more accurately predict tags. Therefore, using expert eye gaze pattern features on Stack Overflow documents to predict tags may prove lucrative.

I also finished my senior capstone report and presented the work. This work was an extension of the research from last year's CREU project. We explored the same eye-tracking dataset using sequential analysis techniques to study/predict developer expertise with different measures. These techniques also put emphasis on the order in which developers navigate code. This work is a more mature version of my QUEST poster and the paper we submitted to MSR a few months back. Here is a picture from my presentation:


Finally, I graduated Summa Cum Laude with a BS in mathematics and computer science from Youngstown State University today. Here is a picture with Dr. Bonita Sharif and Dr. Alina Lazar:

Outcome(s)

  1. Finished the final report for CREU
  2. Presented my capstone project extending the eye-tracking research we performed in last year's CREU project.
  3. Graduated!!

Thursday, April 27, 2017

Week 30: 4/22/2017 - 4/26/2017

Work Accomplished: 
This week, I printed out the data from Databricks after running the small xml file through Apache Spark. I am in the process of writing the abstract. Next week, I will be presenting this project to the YSU CSIS department as part of my senior capstone. I wrote the draft of our final report for the CREU report for Jenna to edit. 

Outcomes
Finished final report, finished senior project and finished the study as a whole. 

Thursday, April 20, 2017

Week 29: 4/12/2017 - 4/19/2017

Work Accomplished:
This week, Jenna helped me to import the Kaggle data into Databricks to run Apache Spark on to train the data. It was challenging to set up a cluster and get the data to import correctly into each column. The next step is to run the bigger data set and test it so that we can compare against the data sets. 

Weekly goal: Finish the dataset testing. 

Future goal: Finish testing the datasets and use the keyword results to inform our tags so that we might be able to predict tags on Stack Overflow. We also need to submit our final paper for this project by May 5th

Wednesday, April 19, 2017

Week 29: 4/12/2017 - 4/19/2017

Work Accomplished

This past week, I worked on using DataBricks to run Apache Spark on a small dataset from the Kaggle competition with Alyssa. Alyssa and I worked together to move past the trouble we had last week and complete running Spark on the small dataset. I also presented a short summary of our paper, On Predicting Developer Expertise from Eye Gazes for Bug Fixing Tasks, that was accepted in the YSU Honors College Journal at the kick-off luncheon for the journal. This paper was about the results from last year's CREU project.


Goal

Weekly goal(s) - In the next week, I will finish up the analyses we have left for the keyword predictions and start writing our end of the year report.
Long-term goal(s) - Predict keywords by modifying the process of a propose method for keyword prediction in a Kaggle competition by incorporating eye-tracking. We will also predict keywords without eye-tracking and compare the two keyword sets generated. These keywords inform our tags, so determining them will tell us which pieces of code and/or text in a StackOverflow document are pertinent to tag selection.

Outcome(s)

  1. Finished running Spark jobs in DataBricks on the Kaggle competition data
  2. Presented results from a paper we wrote during last year's CREU project at an event for the journal it was accepted at

Thursday, April 13, 2017

Week 28: 4/5/2017 - 4/12/2017

Work Accomplished

This past week, I worked on using DataBricks to run Apache Spark on a small dataset from the Kaggle competition with Alyssa. Alyssa and I worked together to walk through the DataBricks User Guide (https://docs.databricks.com/user-guide/index.html). We were able to get the small dataset loaded into DataBricks and start up a cluster to run our Spark jobs on it. We ran into some trouble, because the Community Editions of DataBricks doesn't seem to allow us to run jobs on our cluster, so we are working through this issue during this next week. There may be a way around this by using a specific type of cluster.


Goal

Weekly goal(s) - In the next week, I will be using DataBricks to run Apache Spark on a small dataset from the Kaggle competition with Alyssa by figuring out how to run jobs on a cluster in DataBricks. This will get me introduced to using DataBricks for keyword prediction, so that we may use it on our data.
Long-term goal(s) - Predict keywords by modifying the process of a propose method for keyword prediction in a Kaggle competition by incorporating eye-tracking. We will also predict keywords without eye-tracking and compare the two keyword sets generated. These keywords inform our tags, so determining them will tell us which pieces of code and/or text in a StackOverflow document are pertinent to tag selection.

Outcome(s)


  1. Made progress on running Spark jobs in DataBricks by loading Kaggle competition data into it and reading over a user guide.

Wednesday, April 12, 2017

Week 28: 4/5/2017 - 4/12/2017

Work Accomplished: 
This week, Jenna helped me to upload our training data into DataBricks, the online platform for running Apache Spark. She was able to solve the issue of our file upload. The Train data from the Kaggle Competition is now running. Jenna and I are going to figure out how to run it through a cluster. I am going to start writing the draft of our paper and gather the rest of the data for analysis.


Weekly Goal: Get the rough draft of our paper written.


Future Goal: Complete data analysis and machine learning. Submit our paper to be published. 

Thursday, April 6, 2017

Week 27: 3/29/17 - 4/5/2017

Work Accomplished

This past week I presented my poster at QUEST (see the image below); a conference hosted at our university for our university's students to present their research. I also checked over Alyssa's generated keywords and I generated my own keywords for our without eye-tracking keyword predictions. It was difficult to think of keywords that might be important for tag prediction, but that Alyssa hadn't already thought of before. Finally, I ran my pre-processing scripts (which I combined into one script) on our collected eye-tracking data.

Goal

Weekly goal(s) - In the next week, I will be using DataBricks to run Apache Spark on a small dataset from the Kaggle competition. This will get me introduced to using DataBricks for keyword prediction, so that we may use it on our data.
Long-term goal(s) - Predict keywords by modifying the process of a propose method for keyword prediction in a Kaggle competition by incorporating eye-tracking. We will also predict keywords without eye-tracking and compare the two keyword sets generated. These keywords inform our tags, so determining them will tell us which pieces of code and/or text in a StackOverflow document are pertinent to tag selection.

Outcome(s)

  1. QUEST poster presentation completed
  2. Checked over Alyssa's keywords and added my own
  3. Ran pre-processing script on collected data

Week 27: 3/29/17 - 4/5/2017

Work Accomplished: 
This week, I finished the list of Keyword AOIs from the 9 tasks in our Stack Overflow study. I sent them to Jenna to check, but I found it easier to do a manual analysis than waiting for the data to export. I also presented 'Improving Stack Overflow Tag Prediction', the first half of our study, at Youngstown State University's QUEST, a forum for undergraduate and graduate research. Jenna and I presented a poster. We are also using Databricks, a virtual analytics platform, to run Apache Spark. I uploaded the train data from the Kaggle competition as a cluster and I'm going to run a bigger file (also part of the Kaggle Competition) to see how the two compare against one another. We are then going to take our participant fixation data and run it in the online forum. 

Weekly Goal: Get Alex scheduled to participate in the study. 

Future Goal: Start writing the abstract for the second half of our study to be submitted. Finish the data analysis. 

Thursday, March 30, 2017

Week 26: 3/22/2017 - 3/29/2017

Work Accomplished

This past week I finalized my poster for QUEST titled, Towards Mining Eye-tracking Datasets for Expertise Prediction, and sent it to printing. I also finished the R script to complete step 3 of the data pre-processing steps that merges rows in each task's dataframe based on the fixation index. The hardest part about writing this R script was selecting the correct merging pattern for each data field across each dataset. I also completed a fellowship application to our Phi Kappa Phi chapter at YSU.

Goal

Weekly goal(s) - In the next week, I will be double checking keywords that Alyssa makes for the non-eyetracking keyword selection. I will also be working to gather statistics on the new datasets I created, such as the number of views of each AOI per task per participant and across participants. This should help us narrow down keywords.
Long-term goal(s) - Predict keywords by modifying the process of a propose method for keyword prediction in a Kaggle competition by incorporating eye-tracking. We will also predict keywords without eye-tracking and compare the two keyword sets generated. These keywords inform our tags, so determining them will tell us which pieces of code and/or text in a StackOverflow document are pertinent to tag selection.

Outcome(s)

  1. QUEST poster sent to printing and final draft completed
  2. R script to complete step 3 of the data pre-processing steps written (see above explanation of step 3)
  3. Phi Kappa Phi Fellowship application completed

Wednesday, March 29, 2017

Week 26: 3/22/2017 - 3/29/2017

Work Accomplished: 
This week, I started working on creating keywords AOIs using the data we've collected for our study. I manually examined similar keywords for all 16 participants and determined each one for the 9 tasks. I also uploaded my pre/post survey data to Dropbox under Survey Results. The next step is to take the Train and Test data from a Kaggle competition on Facebook and run it through Apache Spark to test for a keyword analysis. Then, I will take our data and run it through to see if we get similar results. After I export the data and create an excel file with images corresponding to our AOIs, Jenna will run that data through her R script to separate each value. From there, we will analyze the data further and come up with our tag prediction system.


Weekly Goal: Get up to 20 participants for the study. I was able to get 1 more person scheduled, but they were unable to do the study last week.


Future Goal: Take the exported data and compare it against the train and test data from predicting keywords on Kaggle.



Thursday, March 23, 2017

Week 25: 3/15/2017 - 3/22/2017

Accomplishments: 
This week, I wrote up the abstract for my poster at QUEST, Improving Stack Overflow Tag Prediction Using Eye Tracking and Jenna submitted it for me. I also gathered the new participant data and uploaded the pre/post responses to our Dropbox. Next week, I will be in charge of creating keyword AOIs in Tobii Studio and hopefully gathering the last of our remaining participants. 

Weekly Goal:  Gather more participants for our study.

Future Goal: Run data through machine learning algorithm and present my poster at QUEST in April. 




Wednesday, March 22, 2017

Week 25: 3/14/2017 - 3/22/2017

Work Accomplished

This past week I created a draft of my poster for QUEST titled, Towards Mining Eye-tracking Datasets for Expertise Prediction. I also finished the R script to complete steps 1 & 2 of the data pre-processing steps listed below. The hardest part about writing this R script was splitting up the data into files with all participant data per task, because the data is currently per participant where each participant file has the eye-tracking data for all tasks.

Goal

Weekly goal(s) - In the next week, I will be working on step 3 of the data pre-processing steps below, which will be the most difficult of all of the steps, because aggregations are tricky. I will also be presenting the draft of my poster for QUEST as practice for the actual presentation.
Long-term goal(s) - In the next few weeks, I will be working to accomplish the pre-processing steps listed below. (2 of which are already complete)
  1. Create one excel file with the TobiiStudio Field name shown on the left for each participant with all tasks
  2. Create a file for each task with all participant data with relevant experiment field name data columns
  3. Create a file for each task with fixation and duration data merged. All other columns remain as in step 2.

Outcome(s)

  1. QUEST poster draft completed
  2. R script to complete steps 1 & 2 of the data pre-processing steps written

Saturday, March 18, 2017

Week 24: 3/8/2017 - 3/14/2017

Work Accomplished

This past week I worked on writing and submitting my abstract for a poster to QUEST at YSU. My poster will be about mining eye-tracking data using sequential analysis techniques; a presentation of the paper we submitted to MSR a few weeks ago. I also spent time touching base with Dr. Sharif (outside of our regular meeting time) to discuss next steps for our StackOverflow data analysis and I got a clearer picture of the data pre-processing steps we need to complete in the coming weeks:
  1. Create one excel file with the TobiiStudio Field name shown on the left for each participant with all tasks
  2. Create a file for each task with all participant data with relevant experiment field name data columns
  3. Create a file for each task with fixation and duration data merged. All other columns remain as in step 2.
I will be working with Alyssa in the next few weeks to accomplish these steps.

I also found out yesterday that I will be receiving the 2017 NSF Graduate Research Fellowship. So, that was exciting news!!

https://www.fastlane.nsf.gov/grfp/AwardeeList.do?method=loadAwardeeList

Goal

Weekly goal(s) - In the next week, I will be working on step 1 of the pre-processing steps listed above and on my poster for QUEST.
Long-term goal(s) - In the next few weeks, I will be working with Alyssa to accomplish the pre-processing steps listed above.

Outcome(s)

  1. Clear next steps for data pre-processing for the StackOverflow project
  2. QUEST poster abstract submission

Wednesday, March 15, 2017

Week 24: 3/8/2017 - 3/16/2017

Work Accomplished: 

This week, I had to write an R script with the help of Jenna. I am still working on the R script that will be able to process our excel files once I export the data from Tobii Studios. I've collected user pre and post questionnaire data for the users who participate in our Stack Overflow Eye Tracking study. I have those files stored on my computer and I have to upload them to our Dropbox. I also wrote an abstract for YSU's QUEST in April. I attended a digital media conference in New York City over spring break and it gave me perspective from software developers working in newsrooms. The outbreak of digital media will continue to grow and it was interesting to see the software that some of the outlets use to track data. Storyful, a news outlet that gathers data from social platforms around the world, uses a heatmap to store and track news articles/video from around the world. It's refreshing to know that even traditional media outlets are starting to realize the importance of online content and data technology. 

Work Accomplished: Collected more data, submitted QUEST abstract for review. 

Future Work: Present poster at QUEST, finish R script to export user data and submit pre/post questionnaire data to Dropbox.

Thursday, March 9, 2017

Weeks 22 & 23: 2/22/2017 - 3/8/2017

Work Accomplished

I decided to combine this week's and last week's blog post, because I spent last week at OCWiC 2017 and on a graduate school visit and this week is our spring break. So, I was able to make progress on the data analysis for the StackOverflow project this week, while last week I was busy networking. OCWiC 2017 was a lot of fun and a great networking opportunity. I met many female students and faculty throughout Ohio that could be potential collaborators moving forward. I also attended a very informative talk about empowering future female programmers, which will inform our ACM-W chapter's outreach events. Here is a photo of me presenting my poster:


I also spent four days at Carnegie Mellon University on a graduate school visit. There I met many faculty and students in the Institute for Software Research. I really felt like it was the right fit, so I officially committed to CMU this week.

Finally, I wrote an R script to take the StackOverflow study data and sort it by column name for input into our data analysis.

Goal
Weekly goal(s) - In the next week, I will be taking some time off from research on the StackOverflow project for the rest of spring break and until I can touch base with Dr. Lazar and Dr. Sharif on the next steps.
Long-term goal(s) - Perform select data analyses appropriate for answering the following research questions:
  1. To what degree do programmers focus on key words that extraction techniques generate?
  2. To what degree do the top n keywords from our approach and the standard approach match our Oracle generated keywords?
  3. What are the best machine learning algorithms (informed by eye gaze) that can be successfully used to make predictions?
Outcome(s)
  1. Presented my poster at OCWiC 2017 and attending various talks and workshops there
  2. Visited Carnegie Mellon University and made my official decision to attend their PhD program in Software Engineering in the fall
  3. Wrote an R script to sort our StackOverflow data by column name and output to a new file

Week 23: 3/01/2017 - 3/08/2017

Accomplishments: This week, I spent most of my time gathering more data for the eye tracking study. I managed to squeeze in 2 more participants before our spring break. Jenna and I are coinciding on an R script to allow participant data to be searched by column name when given a file. She is working on writing the script and I'm going to review it after spring break. I attended a panel for the ACM-W about gender inclusion in entrepreneurship. I don't see myself opening up my own tech company, but start-ups are populating the area and it's good to receive any insight as to what challenges they may face, or are facing. 


Weekly goal: Gather more participant data to reach at least 20 people

Future goal: Use the R script to loop through participant files and give an analysis of each combined file so that we can use this to inform our machine learning algorithm.


Work accomplished: Received more participant data to analyze, in the process of writing an R script. 

Wednesday, March 1, 2017

Week 22: 2/22/2017 - 3/01/2017

Accomplishments: 
This week, I only had one participant in my eye tracking study. On Friday, I attended the Ohio Celebration of Women in Computing Conference in Sandusky, Ohio with our 'YSU Team'. There, I presented my poster 'Improving Stack Overflow Tag Prediction Using Eye Tracking' to a panel of judges, along with female students from various schools across the United States. It was an opportunity to not only meet potential employers in the industry but develop connections with women in technology who are also conducting similar studies in eye tracking and multi-label classification. 




I attended several of the sessions at the Sawmill Creek Lodge, one of which I found interesting and insightful to my eye tracking study. Cindy Marling, an associate professor of computer science at Ohio University, gave a talk about her research in software that monitors blood glucose levels for those who are diabetic. Marling used a machine learning model for the blood levels, and when asked her process for choosing the best algorithm, she mentioned Weka. I had never heard of this and after some light digging, I found Weka to be a useful tool. It's an open source software that offers a collection of machine learning algorithms specifically for data mining. I will be following some of the tutorials later on once I finish data collection. OCWIC was a great experience and I will encourage all STEM women to get involved early on in their career. 


Weekly Goal: Collect more participants to reach goal of 20 in the next few weeks. 


Future Goal: Export participant data into 7 files and write an R script that will take a file and loop through each one to separate each individual column name of the AOI. 

Thursday, February 23, 2017

Week 21: 2/15/2017 - 2/22/2017

Work Accomplished:

This week, I continued to gather more participants for our study. Overall, we have 13 people who have completed the tasks, and we are looking to get at least 20. I practiced my lightning talk that I will be presenting on Friday at the OCWIC conference. I've been preparing my presentation on the first half of our research study for OCWIC as well. I have also printed out copies of my resume to bring to the conference for the review workshop. 

Weekly Goal: Keep gathering more computer science students for the study. 

Future Goal: Once all participants have been gathered, stream data into Apache Spark. 

Week 21: 2/14/2017 - 2/22/2017

Work Accomplished
This past week I helped edit Alyssa's lightning talk slide show for our poster presentation at OCWiC 2017 this weekend. Alyssa will be presenting the poster and slide about our Stack Overflow study for this year's CREU project (I will be presenting another poster/slide, see previous blog post).  I also made final edits, created an author bio for myself, and wrote an acknowledgements paragraph for the paper (On Predicting Developer Expertise from Eye Gazes for Bug Fix Tasks) that was accepted to The Honors College at YSU's Academic Journal called The Emperor. This paper describes the analysis and results of the prediction experiments we performed in last year's CREU project. Finally, I was able to re-run the ABB eye-tracking data on the new fixation filter producing much more accurate fixation results.

Goal
Weekly goal(s) - In the next week, I will be presenting a poster at OCWiC 2017 and transitioning from the ABB sequential analyses project back to the Stack Overflow project Alyssa has been working on. I plan to aide her and Dr. Lazar in selecting and running the correct data analyses for the data Ali collected last semester.  I hope to learn more about the inner workings of the data analyses we select and their appropriate uses.
Long-term goal(s) - Continue working on the ABB sequential analyses project goal (see previous blog post), but outside of CREU working time. For the Stack Overflow project, I plan to select and perform data analyses appropriate for answering the following research questions:
  1. To what degree do programmers focus on key words that extraction techniques generate?
  2. To what degree do the top n keywords from our approach and the standard approach match our Oracle generated keywords?
  3. What are the best machine learning algorithms (informed by eye gaze) that can be successfully used to make predictions?
Outcome(s)
  1. Alyssa's lightning talk for OCWiC 2017 was edited and submitted
  2. ABB eye-tracking data re-run on the new fixation filter; much more accurate results
  3. Final submission for publication of On Predicting Developer Expertise from Eye Gazes for Bug Fix Tasks in YSU's Honors College journal, The Emperor

Wednesday, February 15, 2017

Week 20: 2/8/2017 - 2/15/2017

Accomplishments: 
This week, I gathered three more students for our eye tracking study. I finished my poster and lightning talk for the upcoming OCWIC conference, and I uploaded all of the participant data into a file. I will be analyzing this data to look at any trends that I find regarding tag scores of novices v.s. non-novices.  My only concern is that I will not have enough non-novice data to compare against, but it should be fine for now. 

Weekly goal: I would like to get one more participant for my data, particularly someone who is a non-novice.  

Future goal:  I need to run my data through Apache Spark, and give my lightning talk on February 24. 

 

 

Week 20: 2/8/2017 - 2/14/2017

Work Accomplished

This past week I created a lightning talk slide show for my poster presentation at OCWiC 2017 (coming up on the 24th of February). I will be presenting the same poster I presented at Tapia (last semester) about the machine learning expertise prediction we did in last year's CREU project. We found that eye-tracking data on software developers solving bugs provide us with a feasible way to predict developer expertise using machine learning algorithms. I also traveled to the University of Notre Dame on a graduate school visit for their Ph.D. program in Computer Science. From this trip I learned that I want to do research in the area of source code summarization and generating source code from the English language.

I did not have time to re-run the ABB eye-tracking data and add support for method call and method use differentiation in our sequential analysis of this data. These goals will roll over to next week.

Goal
Weekly goal(s) - In the next week, I plan to re-run the ABB eye-tracking data we collected on the newly updated fixation filter in iTrace and include support in our sequential analysis for differentiating between a method call and a method use.
Long-term goal(s) - To perform many more (roughly 8) sequential analyses on the ABB eye-tracking data after adjustments to the fixation filter and STS data creation R scripts. This will be done to determine variance in expert and novice eye gaze patterns during bug fixes.

Outcome(s)
  1. Lightning talk slide show for OCWiC created
  2. Graduate school visit to the University of Notre Dame; decided my area of study in graduate school

Friday, February 10, 2017

Week 19: 2/1/2017 - 2/8/2017

Work Accomplished
This past week I wrote the Results section of the paper we are submitting to the Mining Software Repositories conference about mining eye-tracking data for software tasks. I finished generating all the entropy, turbulence, and similarity metrics results for all tasks. After including these results nicely in the Results section, I added bullet points of information in the Discussion and Future Work sections (Dr. Sharif and Dr. Lazar will be finishing the Introduction/Related Work and the Conclusion/Discussion/Future Work sections). I went back and made minor editing changes to the sections I wrote. I also presented a more mathematical intense variation of the similarity metrics work at the 2017 Nebraska Conference for Undergraduate Women in Mathematics on Saturday.

Goal
Weekly goal(s) - In the next week, I plan to re-run the ABB eye-tracking data we collected on the newly updated fixation filter in iTrace (some bugs were fixed that may change our results) and include support in our sequential analysis for differientiating between a method call and a method use (which are distinct source code elements). I am also going on a graduate school visit to the University of Notre Dame. This may cause some of my weekly goals to roll over to the next week.
Long-term goal(s) - To perform many more (roughly 8) sequential analyses on the ABB eye-tracking data after adjustments to the fixation filter and STS data creation R scripts.

Outcome(s)
  1. Results for entropy, turbulence, and similarity metrics computations on our eye-tracking data
  2. Completed Results section and final touches of my portions of the MSR paper
  3. Presentation given at NCUWM

Wednesday, February 8, 2017

Week 19: 2/1/2017 - 2/8/2017

Accomplishments: 

This week, I gathered more participants for our Stack Overflow study. I'm at 5 members so far and with more scheduled next week so I will be able to meet my goal of at least 6. I finished my poster about the results of our eye tracking study regarding participant tag scores based on novices v.s. non-novices. I also finished the lightning talk PowerPoint to accompany my poster. After I receive the last participant, I will be ready to start the data analysis. 

Goals: 

Weekly Goal: Gather participants

Long Term Goal: Use the participant data gathered to conduct another analysis and train a machine algorithm to create a tag prediction system for Stack Overflow. We may or may not look into how to automatically predict keywords in the text as well. 

I am also working on an Operating Systems project with a few other students in the department, and it is going to take up a lot of my time. Practicing the Unix commands will help me with Apache Spark so I'm excited to be able to do these simultaneously. 



Thursday, February 2, 2017

Week 18: 1/25/2017 - 2/1/2017

Accomplishments: This week, I had a student come into the Empirical studies lab and participate in our Stack Overflow study. I have two more students scheduled to come in next week. This is vital for fresh analysis as we continue with machine learning. 

I presented the rough draft of my OCWIC 2017 poster about the results of the first half of our data analysis to Jenna, Dr. Sharif and Dr. Lazar. Ali calculated tag accuracy scores (relevant tags compared to distractor tags chosen by participants), and scores were split up into three categories: Simple, average and complex. Our results showed that simple level tag scores were 97.46%, average: 89.76%, and complex: 87.04%. From these results, we could conclude that users spent average time considering all 10 tags after viewing c++ post contents. After reviewing fixations, saccades and fixation durations, we split the data down further by participant experience level, novice v.s. non-novice. Novices were all those with 1 year or less programming experience and non-novices consisted of those with 3 years or more. We concluded that non-novice developers performed better than novices in tag assignment and depended more on code, and novices were less accurate in tag assignment as they assigned less tags on average.

Weekly goal: My goal is to get at least 6 students in the department to participate in the study. If I can get at least 1 student a week, my goal will be met. My next move is to finish the poster and present these findings at OCWIC. Then, I will continue to train data into Apache Spark. 

Long term goal: Create a successful tag prediction system using the tag scores of these individuals. 




Week 18: 1/25/2017 - 2/1/2017

Work Accomplished
This past week I wrote the Analysis and Data sections of the paper we are submitting to the Mining Software Repositories conference about mining eye-tracking data for software tasks (an extension of last year's CREU project). I wrote about the within sequence turbulence and entropy analyses I am still gathering results for, and I wrote about the sequence similarity k-means clustering analyses I am also still gather results for. I also prepared and practiced a presentation for the 2017 Nebraska Conference for Undergraduate Women in Mathematics.

Goal
Weekly goal - In the next week, I plan to finish gathering results for the sequential analyses I am performing and write the Results and Discussion sections of the paper. I am also leaving for NCUWM 2017 this weekend to present the sequence similarity analyses.
Long-term goal - To perform many more (roughly 8) sequential analyses on the ABB eye-tracking data we collected three years ago. From this we hope to gain insights into developer gaze patterns and differences among novice and expert programmers. This analysis considers the order in which developers looked at source code elements, so in this way it is different from last year's machine learning analyses.

Outcome(s)
  1. Large portions of our MSR paper are complete; we are on time to submit by Feb. 10
  2. R scripts implemented to perform entropy and turbulence sequential analyses using TraMineR
  3. Partial implementation of R scripts to perform the similarity analyses using TraMineR
  4. Learned more about the mathematics of the analysis techniques for my presentation

Thursday, January 26, 2017

Week 17: 1/18/2017 - 1/25/2017

This week, we discussed the best way to approach the data analysis. I will be using Apache Spark and the command line and running my data projects in the IDE IntelliJ. We weighed the advantages and disadvantages of running Spark through a virtual box, and found using an IDE might be the best way to go. 

I helped to give a demonstration about our eye tracking equipment and software last Friday with one of the graduate CS students at YSU to several groups of high school students. We showed them the interesting aspects of eye tracking as well as the advantages it brings to our field. The students enjoyed the Asteroids game where each time a user starts the game, they have to pop incoming asteroids using only their gaze. We also demonstrated using dual screens how the eye tracker records fixations of any user, and we asked some of the students to find different values on the YSU website. 

I gathered two students in the CIS department to participate in our study. This Friday, my first volunteer will be coming in to the Empirical studies lab. I hope to gain at least 4-6 more students throughout the semester so that we can continue to analyze fresh data. 

Next week, I will be finishing up my poster for the Ohio Celebration of Women in Computing based on the analysis of our first set of data. I will be attending the conference on February 24-25 with Dr. Sharif, Dr. Lazar, Jenna and a few other students in the computer science department. I'm excited to attend the talks (as it's my first technical seminar that isn't local) and I can't wait to hear from women with experience in the software engineering field!  What an exciting time to be a woman!  I will be sure to post photos from the seminar upon my return. 

Weeks 16 & 17: 1/11/2017 - 1/25/2017

These past two initial weeks back to school have been hectic, so I decided to combine the posts for the last two weeks together. In addition to beginning data analysis on our Stack Overflow project, I have been working on a separate data analysis to expand upon our expertise prediction analysis from CREU 15-16.  This analysis looks at the ABB eye-tracking data we collected in a different fashion than machine learning prediction algorithms. I used sequential analysis techniques from an R package to look at the data as sequences, so the order of what participants looked at is taken into consideration. Initial results show significant difference in the gaze patterns of novices versus expert. The most challenging part of my analyses was dealing with how large the sequence data is for eye-tracking data. I had to manipulate the R package I was working with to make some of the analyses work with such large data.

We intend to submit these results to a journal and to the Mining Software Engineering Repositories conference (MSR). So, this past two weeks I created and presented a presentation about the analyses I performed to get an idea of what to submit to MSR. I also moved the beginnings of our journal paper to Overleaf in order to write the paper collaboratively.

This past weekend I participated in a 4 day mathematical modeling competition called COMAP. I worked with two other students to model the merging of cars following a toll booth barrier. We wrote and submitted a report detailing our model.

Thursday, January 19, 2017

Week 16: 1/11/2017 - 1/18/2017

It's the start of the Spring semester and we are working on the second half of our project. We are continuing to collect data. This week, we discussed our upcoming trip in February to present at the conference for Ohio Celebration of Women in Computing in Huron, OH. I'm working on creating a poster about the results of our eye tracking data in reference to participant tag scores. I reviewed the results with Jenna, Dr. Sharif and Dr. Lazar, and I am working on a short presentation about the goal of our project. 

I am going to run the data from the research paper 'Predict Closed Questions on Stack Overflow' into Apache Spark and compare the results of the original to what I find. This will be my first time using Apache Spark with the command line and I'm looking forward to learning more about machine learning algorithms.