Wednesday, September 28, 2016

Week 4: 9/20/2016 - 9/27/2016

This past week I read two papers, Gaze-tracked Crowdsourcing and Reading Without Words: Eye Movements in the Comprehension of Comic Strips.

The Gaze-tracked Crowdsourcing paper tries to determine whether the gaze-tracking of workers during word disambiguation task solving can disclose useful information that can improve the task output. The authors of this paper arrived at two interesting results: first, that the majority of thier participants read from the beginning to the end in order until they found a strong enough sense distinguishing word, and second, the rest of the participants preferred fast text skimming. It would be interesting to determine whether this behavior holds true when software developers view stack overflow documents when trying to determine the appropriate tag for the questions.

The other paper, Reading Without Words: Eye Movements in the Comprehension of Comic Strips, I am assigned to read and summarize for our 10/4/2016 meeting. I decided to intially read and create a PowerPoint presentation early, because the paper is longer and I know I will need to take another pass at it this week. I will explain my observations of this paper in my next blog post.
trackiof workers during word disambiguation task solving
can disclose useful information that can improve the task

Tuesday, September 27, 2016

Week 4: 9/20/2016 - 9/27/2016

This week, I read 'Gaze-Tracked Crowdsourcing' by Jakub Simko and Maria Bielikova. The information is pertinent to our study because we have been using gaze-tracking with iTrace, and it's helpful to gain a different perspective of study with this method. In the study, participants were asked to read through snippets of text and based on a word from the text, determine the sense of a word from a possible set of senses. 

We have been assembling a task (consisting of questions from StackOverflow) similar to the task designed in the article, but instead of text, we will use snippets of code to get our participants to predict tags based on the content of the question sets. 

I will be sifting through data sets provided by Kaggle this week to compare the tags assigned to each question from different users on the StackExchange network to gain a firmer understanding of tag prediction so that we can continue with our project. 

Week 4: 9/20/2016 - 9/27/2016

This week I finished up my review of the paper "Gaze-tracked Crowdsourcing". From the paper I realized the importance of gaze-tracking as an implicit feedback source that other methods cannot provide (i.e. cursor tracking, clicking, scrolling, etc). While the main take-away from the paper was using eye-tracking to identify sense-distinguishing words to enrich data sets, there was also some information about insight to user confidence. The study was able to detect confidence in user answers and even get some indication as to which answers were going to be correct or not, based upon reading behavior from gaze-data. We discussed our own study and decided that user confidence is something we would also like to record and analyze for our tasks. This week I also began on some experiment design. We came up with our main tasks, which involves having users pick tags based on various presented postings from Stack Overflow. We are going to use C++ questions  that range from simple to complex to use for our tasks. In the coming week I plan to continue to build upon this list of options for questions so we can decide on our final pool of tasks for our experiment.

Tuesday, September 20, 2016

Week 3: 9/13/2016-9/20/2016

This past week I attended the 2016 Tapia Diversity Conference and had a great time!

I had interviews with Bloomberg L.P., Northrup Grumman, IBM, and BNY Mellon for summer internships. Northrup Grumman wants to move forward with an official offer and IBM will be contacting me in the next week. I also visited a lot of graduate school booths. The UC Berkeley recruiter requested that I send her three faculty I would be interested in working with, and I met the chair of the CS department at Brown when I won their swag raffle. I learned how to write a good graduate school application essay from the graduate school recruiters.

My poster presentation was very well received, and many of the poster session attendees seemed very interested in our work for CREU. I learned about other avenues for extending our research such as investigating why we can predict developer expertise. Being able to predict developer expertise well implies that there are underlying differences among the eye gaze features of expert and novice software developers. What are those differences and can we utilize them for teaching techniques/strategy heuristics?

Finally, I read the paper titled, Predicting Closed Questions on StackOverflow. The goal of this paper is to build a classifier that predicts whether or not a question will be closed given the question as submitted, along with the reason that the question was closed. The features used in this paper will be helpful in determining the features to use in our analyses.

Here are some pictures from Tapia:

Week 3: 9/13/2016- 9/20/2016

This week, I analyzed and presented a dataset from a group of researchers in a contest sponsored by Kaggle. I read through Galina Lezina and Artem Kuznetsov's published work, 'Predict Closed Questions on StackOverflow.' Their task involved examining public and private datasets from a group of user, post and tag features in StackOverflow and building a classifier that would predict whether or not a question will be closed based on a reason for a question being closed through a user vote. They had hoped their research paper would ease the task of moderating these posts on the StackExchange server through automation. 

The results indicated that for the method used (an algorithm called vowpal wabbit), user interaction features worsened the outcome for a small value. Text features contributed more to this result. They also noted that some questions' status is open but should be closed in reality. 

I think that even though the baseline models for data provided by Kaggle excluded actual content of each post, each user that examines a question on StackOverflow relies on that content for context if he or she can't understand what is being asked. As the results of the study showed, text features were informative and this holds true in reality. How can one answer a question fully without some reference as to what is being asked on StackOverflow? 

As a StackOverflow user, I hope that we are able to predict these outcomes soon with these kind of classifiers. 

On Sunday, I attended the Silly Science Sunday event sponsored by OH WOW! in Youngstown. I helped Dr. Sharif and Ali gather participants to demo our eye tracking game as well as demonstrate a simple game that used code trace on small robots. We had a great turnout at the event, and as always, it's great to inspire young minds with the lighter side of our technology! 

Monday, September 19, 2016

Week 3: 9/13/2016 - 9/20/2016

This week I wrapped up my evaluation of the study on the research paper, Synthesizing Image Representations of Linguistic and Topological Features for Predicting Areas of Attention. The conclusions we reached were that there seemed to be a more accurate area prediction when a subject was given more time to read the document or if the subject was presented with objectives/tasks before reading. We found that there was very little accurate information gathered on important features for comprehension during the tasks where subjects were given only seconds to read the the text and gather whatever information they could. Jenna also provided us with some information about what iTrace can provide in respect to Stack Overflow, which will be beneficial to our study. She found that pieces of the page are broken up into a certain number of parts, i.e. chunks of text, the title, the code excerpt, comments, votes, etc. iTrace will then return, based on these parts, what part of the page was focused on. So we can use these existing features when implementing our own study. We are also currently evaluating data from a competition that was held in the past on kaggle that we feel is relevant to our study. The competition was to predict the tags on across stack exchange sites (so the dataset includes technical and non-technical questions) only given the text and title of a posting.

Tuesday, September 13, 2016

Week 2: 9/6/16- 9/13/2016

This week, my contributions were to read a paper entitled 'Synthesizing Image Representations of Linguistic and Topological Features for Predicting Areas of Attention.' Ali presented the material at our weekly meeting, and I felt I gained a better grasp of the concept after her presentation. The study was designed to test students in our field on how they retain information after reading a document, and how they comprehend and remember reading material in a given time span. Subjects were asked to retain as much information as they could and then reiterate that information in a ten second sprint. They were also asked to read a document and find the answers in the text. This is important to study because it shows how men and women retain information and how fast they are able to remember pertinent information, a needed strength in our field. 

Next week, I will be presenting my analysis of Galina's study 'Predict Closed Questions on Stack Overflow' so I can relate to this week's presentation. 

Week 2: 9/6/2016-9/13/2016

This past week I spent most of my time getting ready for Tapia. I leave tomorrow morning and I am very excited! I have two interviews lined up with Bloomberg L.P. and BNY Mellon, and I give my poster presentation about the work we did in CREU last year also all on Thursday. I printed out copies of my resume and CV to pass out at the career fair and talking points for my poster presentation. More importantly, I took the time to make substantial revisions to our poster.

I also read, Synthesizing Image Representations of Linguistic and Topological Features for Predicting Areas of Attention, in order to learn more about linguistic analysis techniques combined with eye-tracking data. The results of the paper show that for a precise reading and a question answering task, the linear combination of image representations of linguistic features helps to explain the gaze
evidence of readers within the same document.

Finally, I looked into the StackOverflow support for iTrace. I determined that iTrace can only provide us with which different portions of a StackOverflow document a person looked at. It does not give you meta-data for those portions, although the capability is there.

Pictures from Tapia are to come!

Sunday, September 11, 2016

Week 2: 9/6/2016 - 9/13/2016

For me, week 2 involved finishing up some preparation work as well as beginning some research for our project. I completed the last of the required materials that needed submitted, which included our acceptance letter and a few other documents. I also worked through two of the CITI Program's short behavioral research courses. I wasn't sure what to expect but it provided me some valuable information that gave me an idea of what working with a group of people to gather data is going to be like and some basic ethical practices, as I have never been involved in an official research project outside of the classroom. I also modified our website to include this year's information as well as completely updated the design. I added a page for this blog that includes a live RSS feed widget so that all of our information here can be viewed there as well. We plan on also utilizing our website to post some experiments and detailed information about research in order to display our progress as we move forward with our project.

Our research began this week with some preliminary review on past work with eye-tracking studies. Jenna presented a paper on an eye-tracking study that involved differences between C++ and Python in novices vs. non-novices. While I found the results interesting the research paper gave me an idea for the work we might be conducting with our own project. This week we also reviewed another paper on eye-tracking, this time the topic was on where an eye needs to focus in order to maximize comprehension while reading; specifically which linguistic features are most valuable for full understanding for a reader. I will be putting together a presentation for the group this week on that paper. There were some specific methods/considerations discussed in this paper that I look forward to bringing up with the team and thinking about whether this is something that we may able to apply in our own studies.

I look forward to keeping up with this blog, not only to exhibit my progress as we work through this project but also to keep myself organized. This will be a helpful source to look back upon as the project finishes up.

Friday, September 9, 2016

Week 1: 8/30/2016 - 9/6/2016

As a newcomer to this project, I started off the first week completing the IRB Training Courses, and then reading about Rachel Turner, Michael Falcone, Bonita Sharif and Alina Lazar's eye tracking study, entitled 'An Eye Tracking Study Assessing the Comprehension of C++ and Python Source Code'. 

It's important to examine what programming language novices in my field of study will begin learning from the start of their course work because it can be the deciding factor of if they continue to pursue a degree in the computer science field or not. The results of the study showed that there was a significant difference in the fixation rate of lines of code that contained errors between C++ and Python, and I would have to agree with the conclusion. Learning a new language doesn't come without it's frustrations, but it's retaining those programming concepts from the beginning that help in the long run. 

I also assisted Jenna Wise and Dr. Sharif with multiple demonstrations of our eye tracking program at the Canfield Fair on September 4th, and more than anything, it was exciting to show participants how our technology can be fun! 

I'm eager to continue developing my path as a student researcher and as well as improving my analysis techniques. 

Tuesday, September 6, 2016

Week 1: 8/30/2016-9/6/2016

It has been an exciting week back to school and back to working on this year's CREU project. I created our blog and added Alyssa, Ali, Dr. Lazar, and Dr. Sharif to it.

Tapia is right around the corner, Sept. 14-17, and I will be presenting our CREU project from last year there. So I spent part of this last week putting together a first draft of the poster. After going over the first draft of our Tapia poster with Dr. Sharif, I learned that I can put a lot less words on the poster and a lot more visual aides. The poster should be able to stand on its own, but not be overwhelming the reader to the point where they cannot get the complete idea of the poster in 30s to 1min of looking at it. In the coming week, I will be revising the poster.

Finally, I read, An Eye-tracking Study Assessing the Comprehension of C++ and Python Source Code, in order to get an idea of a study and data analysis related to the study and analysis we would like to perform for this year's CREU project. I learned more specifics about eye-tracking study designs and about Linear Mixed Effects Regression and Mann-Whitney non-parametric tests. I summarized this paper in a PowerPoint presentation for later reference. The paper found that there is a statistically significant difference between the C++ and Python participant groups with respect to the rate at which they looked at buggy lines of code, and a statistically significant difference between novices and non-novices in their code comprehension abilities for both the C++ and Python participant groups.