Work AccomplishedThis past week, I worked on using DataBricks to run Apache Spark on a small dataset from the Kaggle competition with Alyssa. Alyssa and I worked together to walk through the DataBricks User Guide (https://docs.databricks.com/user-guide/index.html). We were able to get the small dataset loaded into DataBricks and start up a cluster to run our Spark jobs on it. We ran into some trouble, because the Community Editions of DataBricks doesn't seem to allow us to run jobs on our cluster, so we are working through this issue during this next week. There may be a way around this by using a specific type of cluster.
Weekly goal(s) - In the next week, I will be using DataBricks to run Apache Spark on a small dataset from the Kaggle competition with Alyssa by figuring out how to run jobs on a cluster in DataBricks. This will get me introduced to using DataBricks for keyword prediction, so that we may use it on our data.
Long-term goal(s) - Predict keywords by modifying the process of a propose method for keyword prediction in a Kaggle competition by incorporating eye-tracking. We will also predict keywords without eye-tracking and compare the two keyword sets generated. These keywords inform our tags, so determining them will tell us which pieces of code and/or text in a StackOverflow document are pertinent to tag selection.
- Made progress on running Spark jobs in DataBricks by loading Kaggle competition data into it and reading over a user guide.