This week, Jenna helped me to import the Kaggle data into Databricks to run Apache Spark on to train the data. It was challenging to set up a cluster and get the data to import correctly into each column. The next step is to run the bigger data set and test it so that we can compare against the data sets.
Weekly goal: Finish the dataset testing.
Future goal: Finish testing the datasets and use the keyword results to inform our tags so that we might be able to predict tags on Stack Overflow. We also need to submit our final paper for this project by May 5th