Home / Blog / Data Science / Classification of Music Genre Using KNN

Classification of Music Genre Using KNN

February 18, 2023
66

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of Innodatatics Pvt Ltd and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 18+ years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Learn the core concepts of Data Science Course video on Youtube:

The dataset we will use is named the GTZAN genre collection dataset which is a very popular audio collection dataset. It contains approximately 1000 audio files that belong to 10 different classes. Each audio file is in .wav format (extension). The classes to which audio files belong are Blues, Hip-hop, classical, pop, Disco, Country, Metal, Jazz, Reggae, and Rock. You can easily find the dataset on Kaggle and can download it from Kaggle

Required Libraries:

It's crucial to install specific libraries before moving on to load datasets and model construction. To extract features and experience something new, we will use the Python Speech Feature Library. We will use the scipy library as well as load the dataset in the WAV format, thus we must install these two libraries.

Import required libraries

!pip install python_speech_features

!pip install scipy

Make the necessary imports to play with the data in your freshly generated Kaggle notebook or Jupyter notebook first.

Create a function to detect neighbors and measure the distance between feature vectors:

KNN operates by determining the K number of neighbors and computing distance. We shall develop various functions to accomplish this exclusively for each capability. The first thing we'll do is put together a function that takes training data, existing instances, and the necessary number of neighbors. It will measure the separation between each point in the training set and every other point, locate the nearest K neighbors, and then return all neighbors. To make a project workflow clear and straightforward, we will build a function to determine the distance between two points.

Determine the category of nearest neighbors:

We now have a list of neighbors, and we need to identify the class with the maximum number of neighbors. To store the class and its associated count of neighbors, we declare a dictionary. We create a frequency map, sort it based on the number of neighbors, and then return the first class.

Model Evaluation:

To verify the precision and accuracy of the algorithm we develop, we also need a function that assesses a model. To calculate accuracy, we will create a function that is quite basic and that returns the total number of accurate forecasts divided by the total number of predictions.

Feature Extraction:

Therefore, we are applying the KNN Classifier, and we have just constructed the algorithm from scratch to give you a sense of how the project would operate at this point. Now that the data has been loaded from all folders of the appropriate categories, we will extract features from each audio file and store them in binary form with the DAT extension.

Mel Frequency Cepstral Coefficients

The technique of extracting significant features from data is known as feature extraction. It involves locating linguistic information and avoiding all noise. Three categories—high-level, mid-level, and low-level audio features—are used to categorize audio features.

⦁ High-level features are related to music lyrics like chords, rhythm, melody, etc.

⦁ Mid-level features include beat-level attributes, pitch-like fluctuation patterns, and MFCCs.

⦁ Low-level features include energy and a zero-crossing rate which are statistical measures that get extracted from audio during feature extraction.

To produce these features, we follow a series of procedures that are collectively known as MFCC, which aids in the extraction of mid-level and low-level audio features. The steps for how MFCCs function in feature extraction are discussed below.

Audio files can range in length from a few seconds to many minutes. And the pitch or frequency is always changing, therefore to comprehend this, we first split the audio stream into little, 20–40 ms long frames.

We attempt to recognize and extract various frequencies from each frame after breaking it into frames. Assume that one frame breaks down into a single frequency when we divide in such a little frame.

remove the noise from the language frequencies

Take a discrete cosine transform (DCT) of the frequencies to eliminate all noise. Students with engineering backgrounds may be familiar with and have studied the cosine transform in discrete mathematics courses.

Now that we have imported everything from the Python speech feature library, we no longer need to implement each of these stages individually because MFCC does it for us. Each category folder will be iterated through, the audio file read, the MFCC feature extracted, and the feature dumped in a binary file using the pickle module. To understand and manage any exceptions that may arise while importing large datasets, I usually advise using try-catch.

Train-test split the dataset:

From the audio file, which is stored in binary format and used as the filename for my dataset, we have retrieved some features. We will now put into practice a function that takes a filename and copies all the data as a data frame. The data will then be randomly divided into train and test sets based on a predetermined threshold because we want to include a variety of genres in both sets. There are various ways to conduct a train-test split. Here, I'm using a random module and looping till the length of the dataset to generate a random fractional integer between 0 and 1; if it's less than 66, a specific row is added to the train test set; otherwise, the test set is used.

Calculate the distance between two instance:

To compute the distance between two points, we must implement this function at the start. However, to fully describe the project's workflow and technique, I will discuss the supporting function after the primary phases. However, you must layer the function on top. To determine the real distance between the two data points, the function requires two inputs (X and Y coordinates). We employ the low-level implementation of conventional linear algebra provided by the Numpy linear algebra module. To determine the real distance, we must first calculate the dot product between the X-X and Y-Y coordinates of both points. Then, we may extract the determinant from the resultant array of both points and determine the distance.

Training the Model and making predictions:

You were all waiting for the stage where you fed the data into the KNN algorithms, made all your predictions, and got accurate results on the test dataset. This step code appears to be a lot, but it is extremely small because we are using a stepwise functional programming method, thus all we need to do is call the functions. The initial step is to gather neighbors, extract classes, and assess the model's correctness.

Use the new Audio File to test the Classifier:

Now that the model has been deployed and trained, it is time to check for fresh data to see how well our model predicts fresh audio files. Since we have all of the labels (classes) in numerical form, we will first create a dictionary with the key being the numerical label and the value being the category name.