Leetcode questions analysis

Code

About Dataset

Leetcode is very popular among programmers. It has so many quality questions for various topics. It is crucial for content provider to understand the question quality with difficulty levels, most/less popular topics among programmers, most liked/disliked questions etc. It will be helpful to get overall trend, skill improvization pathway and which topic needs much more attention then any other to work. Each question contains similar questions title and text, which can be used to sugget similar questions.

Dataset is available on Kaggle

Entire project is divided into three major tasks:

Data Scraping
Data Analysis
Topic prediction

Data Scraping

The data is scraped from this webpage using this file.
Disclaimer: The purpose of the data scraping is solely for generating insights.

Data Analysis

At the time of working on this project, the Leetcode website offers a grand total of 2239 questions spanning across 72 distinct topics.

Among these, the ‘Medium’ difficulty level emerges as the category with the highest number of questions, coming in at just under 1200.

Number of questions for each difficulty level

The graph visually represents the distribution of questions by topic, showcasing the total number of questions for each topic. The predominant topics with the highest number of questions include ‘Array’, ‘String’, ‘Hash Table’, ‘Dynamic Programming’, and ‘Math’.

Number of questions in each topics

Upon analyzing the graph, we observe that when considering the combined factors of difficulty level and total number of questions per topic, the ‘Medium’ category emerges as the one with a significant number of questions across various topics.

Total questions by topic and difficulty Level

We can infer that problem solvers found ‘Shell’ questions to be notably challenging and strenuous to solve, while a majority of coders were able to solve ‘Database’ questions with relative ease.

Topics with average minimum and maximum accuracy

Discover the top four most popular topics on Leetcode, highly favored by problem solvers.

Array
String
Hash Table
Dynamic Programming

Topic prediction

To predict the topic based on question and description text, a systematic approach involves data processing, which includes text cleaning, word frequency analysis, and record preparation.
Subsequently, the data is divided into training, validation, and test sets. Utilizing either simple regression methods or advanced models like Bert, we can effectively predict the topics.

Results

Methods	f1-score
Logistic Regression	0.55
Bert	0.88

Both notebooks are available here