portfolio | haydenhoopes

Projects

Degree SwirlNew!

In this project, I am analyzing graduation data from Utah State University to determine which students should be targeted with initiatives to help them get into their final majors more quickly without taking classes that will become irrelevant to their final majors. Changing majors and taking classes unrelated to a student's major is called "degree swirl". The project is still ongoing but has involved pulling many data sets together and analyzing students on an individual course level, using a custom metric called "swirled credits" as a response variable.

Emotional Speech Classification

In this project, I used audio processing techniques, data augmentation, and convolutional neural networks to classify different types of audio speech recordings into one of eight categories. The final model ended up predicting the correct emotion of an audio recording with an accuracy of 51%. This project helped me explore the nuances of working with Tensorflow, Keras, and audio data and helped me understand how to creatively apply different kinds of deep learning models to solve unique and challenging problems.

Click here to view the full analysis.

Spam or Ham Detection

In this project, I compared sequence modeling techniques against a bag of words approach to build a deep learning model that could detect whether or not a message is considered spam or "ham" (not spam). Surprisingly, the bag of words model outperformed the model that used sequence modeling, although both models performed very well (~98% validation accuracy).

Click here to view the full analysis.

Image Classification

This project uses convolutional neural networks to perform image classification. Specifically, the model attempts to classify different images into one of six different classes: buildings, forest, glacier, mountain, sea, or street. A pretrained model VGG16 was also used to perform transfer learning to build a classifier that could take advantage of the learned features from this enormous deep learning model to perform classification even better that I could with my data set alone.

The best model ended up being the one that extended the power of the pretrained VGG16 model, which gave a validation accuracy of 90.7%.

Click here to view the full analysis.

Python for Data Technicians

As the Department Head of the Data Analytics department at Bridgerland Technical College, I created two Python classes called Introduction to Python and Advanced Python that taught students the fundamentals of programming in Python. The classes were equal to 5 credits at the university and were built like a traditional programming class, with lots of theory built in.

Time went on and I realized that I could build something even better. As a result, I redesigned both classes a second time, using an innovative approach to teaching Python specifically for data analytics that helped students gain familiarity with important Python libraries like pandas, matplotlib, and numpy while learning programming concepts in the context of exploratory data analyses. All lecture materials were also provided to students in interactive Google Colab files which made the materials easy to access and use, and the variety of projects provided students with a portfolio of EDA examples that they could take to job interviews.

The class saw massive success not only among students, but among the community as well. The course was delivered to high schools around Cache Valley, professors at Utah State University, and was even used in other technical and community colleges in the United States. The final lecture materials were exported to PDF files and I gave them a title appropriate for a low-level data analytics practitioner: Python for Data Technicians.

The content was hosted in Google Colab notebooks that are available to the public:

1.1 Introduction to Python
1.2 Google Colab
1.3 Comments and Printing
1.4 Variables and Data Types
1.5 Operators
2.1 Working With Data in Python
2.2 Accessing Data
2.3 Modifying Data
2.4 Descriptive Statistics in Python
2.5 Exploratory Data Analysis
3.1 Conditions (If-Else Statements)
3.2 Filters
4.1 Dictionaries and Lists
4.2 For Loops
5.1 Functions
5.2 String Functions
5.3 Regular Expressions
6.1.1 Duplicates
6.1.2 Missing Values
6.1.3 Standardization
6.1.4 Business Logic Errors
6.1.5 Outliers
7.1 Introduction to matplotlib
7.2 Formatting Plots with matplotlib
7.3 Using matplotlib with pandas
7.4 Better Charts with seaborn
8.1 Introduction to Web Scraping
8.2 HTML, CSS, and JavaScript
8.3 How the Internet Works
8.4 Making HTTP Requests
8.5 Parsing HTML with BeautifulSoup
8.6 REST APIs
9.1 What is NumPy?
9.2 Important NumPy Functions
10.1 GitHub
10.2 Installing Python on Your Personal Computer
10.3 Installing Jupyter Lab

Finca Raíz Regression Analysis

I lived in Colombia for two years between 2015 and 2017 and absolutely loved it. Because of this, I wanted to investigate real estate prices throughout Colombia to try and identify areas that would be "good investments". This led me to perform a regression analysis on a data set that I scraped off of the internet and find several properties that seemed appealing.

The analysis led me to search out properties that had a high predicted price and a low actual price. In my mind, these properties were the most likely to be "undervalued" according to the regression. I ended up finding several properties that seemed like they would be worthwhile to investigate a little more, but the disparity between house pricing throughout different regions of the country made it difficult to be totally confident in the results of the analysis. In the end, I picked out a few houses that I would like to view more closely to see if they would actually be worth investing in. The linear regression

Click here to view the full analysis.

Finca Raíz Web Scraper

Back in 2017, I used to live in Bogotá, Colombia and wanted to leran more about how real estate ("finca raíz" in Spanish) prices varied throughout the country. Specifically, I was looking for investment opportunities and planned to perform a regression analysis on real estate listings to see if I could identify any "good deals" on properties.

To get the data, however, I needed to scrape it off of the Colombian real estate website www.fincaraiz.co.

Twitter Fake News Detection with NLP

In this project, I explored a data set of tweets to determine which words were more highly associated with fake/not fake news. I also built several classifiers using different natural language processing methods to try and classify which tweets described real disasters and which ones did not describe disasters. In the end, a model using previously trained word embeddings was used to obtain a validation accuracy of about 78%. The baseline accuracy to beat was 57%.

Click here to view the full analysis.

Curricular Analytics

As part of the Center for Student Analytics at Utah State University, I pioneered the institution's study and implementation of curriculuar analytics, a process by which educational curriculum (especially in higher education) is methodically examined for "pain points" that may inhibit students' abilities to graduate on time. As part of my research, I gathered data regarding curriculum complexity scores of different programs at Utah State University and graduation rates of students to determine that a negative relationship existed between students enrolled in high-complexity programs (programs with many strict combinations of prerequisites) and graduation rates. As the complexity of the degree program increased, the 4-, 5-, and 6-year graduation rates tended to decrease. This warrants further study by program administrators, especially those involved in high complexity programs like engineering.

Click here to read the published report about curricular complexity. This work was also featured in an article by the Association of Public Land-Grant Universities.

Census Data Module

I briefly became involved in an analysis project while working at the Center for Student Analytics that was going to work with the Admissions office at Utah State University to better target students in certain demographics for recruitment. As part of the project, my team needed a way to dynamically import Census data into some cloud-hosted notebook files that would perform the analysis and return information to the Admissions users. To make this possible, I built a Census module in Python that would dynamically handle requests by users and provide an easy to use object oriented interface for interacting with the Census API (which is not easy to understand, by the way) and easily comprehending the results. The module worked fairly well and was able to pull down many kinds of Census data, and knowing what to look for became the really difficult part.

For several reasons, the project got scrapped after a few months and I wasn't able to fully flush out the census module. However, it still stands as a useful code for accessing the Census data into usable formats.

You can see the code by clicking this link.

Google Scholar Web Scraping

Dr. Chudoba at Utah State University, the instructor of my capstone course during the final year of my undergraduate degree in Management Information Systems, approached me one day with a request. She was the director for a group of researchers and needed to obtain a list of all the researcher and the research that they had publisehd during the year of 2020. She asked me to build a web scraper that could automatically gather data about each of the publications that the members of her organization had published.

However, the only resource the was given to me to retrieve the data was a list of the names of the members of the organization. Because many researchers have the same name as others, it was difficult to determine which papers had been written by members of Dr. Chudoba's organiation and which ones had been written by other researchers with the same name. It was also difficult to bypass scraping limitations of Google Scholar, snich Google would block my IP address if I sent more than 15 requests per minute.

In the end, I was able to obtain a somewhat complete list of research published by members of the organization the Dr. Chudoba had given me. There really wasn't any way to confirm if the results were correct, so I simply filtered the data to only include data that was about documents that talked about technology topics.

Click here to see the code for this project.

Becky's Bookshelf

As part of my capstone project for my undergraduate degree in Management Information Systems at Utah State Unviersity, I creaed a website for the local used book store Becky's Bookshelf. Becky's previous inventory management system used Microsoft Access and a super old computer, and Becky was not confident that her computer was going to live much longer. I suggested that she move her inventory management system to the cloud and told her that I would create a website that could help her keep track of her inventory and which would help her clients know which books were in stock online.

The website was created and functional but was never implemented due to pricing issues. The cloud system would have required a subscription service fee to host the website and database online, while the store's current system used an on-premise database connected to the only computer in the store. The store ultimately decided that the Access database would meet its needs perfectly fine.

Although my website was not implemented, I learned a lot about using Python to create web applications in Django and how databases can be hooked up to websites. This was a fun project that I gained a lot of experience from.

This is me.

Education

Projects

Certifications

Languages and Tools