Sent Successfully.
Home / Blog / Data Science / Oil Prices and Stock Market Analysis using K-Means Clustering
Oil Prices and Stock Market Analysis using K-Means Clustering
Table of Content
For readers to comprehend the possibilities of data science and machine learning utilised in the oil and gas business, this article provides an introduction to oil and gas analytics. Here, the stock values of several oil businesses have been examined and segmented with comparisons to oil prices throughout various time periods. When we integrate data science with business analysis, the three possible results include cost savings, time savings, and increased safety. In addition to this, there are a few other uses for data analysis and machine learning in the oil and gas upstream sector. These include optimising valve settings in smart wells to increase net present value, machine learning based on rock type classification, big data analysis on well downtime, data-driven production monitoring, pattern identification using multiple variables through the exploration phase, reservoir modelling, and history matching by the strength of pattern recognition, employing machine-learning based proxy models, increase intelligence at the wellhead and optimise integrated asset modelling. In this specific experiment, share market data was obtained, and a machine learning model was used to uncover any hidden patterns. First and foremost, it's crucial to comprehend the analysis's tools since, as we all know, data science provides a variety of methods for evaluating and forecasting patterns. Programming language Python has been utilised, and the Anaconda distribution was used to install it. Several Python libraries for graphical display (matplotlib, seaborn, etc.) and clustering (Scikit Learn for k-means clustering and linear regression) were loaded. We also imported NumPy, a library with sophisticated mathematical operations that works with multidimensional arrays. Pandas' library supports data manipulation and analysis.
Learn the core concepts of Data Science Course video on Youtube:
Data Collection:
Yahoo! Finance is the source of data for this analysis, which is a media property that belongs to the Yahoo! network. It provides financial news, but also it gives an API that handles the financials and stock market data and expert comments that includes stock quotes, press releases, financial reports, and original content. We have connected Yahoo API for share price dataset from Yahoo Finance on a daily frequency from the companies, Shell (RDSB.L), BP (BP.L), Cairn Energy (CNE.L), Premier Oil (PMO.L), Statoil (STL.OL), TOTAL (FP.PA), ENGIE (ENGI.PA), Schlumberger (SLB.PA) and REPSOL (REP.MC). For the oil price dataset, we have taken the data set from the U.S Energy Information Administration.
Fig 1: Packages and Datasets import in Python
The yahoo finance package, known as yfinance, and the datetime package are both imported in the sample of code above. The goal is to collect data over a specified time period of years and request that 10,000 records (days) of that data be utilised for analysis. For variable-created shares, the firm names were saved. The Data, Open, High, Low, Close, adjusted close, and Volume elements make up the initial dataset from the Yahoo Finance database. Data and oil_price characteristics make up the dataset for the price of oil.
Fig 2: Yahoo finance dataset with share market details related to Oil Company
Fig 3: Dataset with oil price details related to Oil Company
Data Preprocessing:
The share price data has been mapped with the company name based on respective dates in the data. This information has been concatenated with the oil price dataset, finally creating a new dataset containing share price, oil price, and company details. The share price has been scaled and saved as share_price_scaled and added to the same data set.
Fig 4: all_data dataset with added column share_price_scaled.
Fig 5: Data concatenation and transformation to create a new dataset (all_data)
Exploratory Data Analysis:
We have the final dataset after data preparation, which we will then process further for the clustering procedure. Exploratory data analysis would shed some light on the dataset's hidden patterns and the connections between its various aspects. Machine learning for the EDA process would leverage several useful data visualisation approaches. To do this, I've decided to use a Simple line plot on the price of oil, a Pairplot on the price of BP's shares from 2000 to 2017, a Pairplot on the price of BP's shares using the previous five years, a Pairplot on the price of BP's shares using the last five years, and more. Oil price violin plan, oil share price violin plot, oil share price violin plot oil & gas firms, Premier Oil and Statoil's joint plots are compared together with the share prices of several firms plotted against the price of oil using various templates.
In Fig 6, we notice the fluctuations in oil prices from time to time movement. Around 1988 the price was 20 $ and currently forecasted to 2024, showing as 120 $. The highest oil prices were noticed in the period 2008 when the value of the barrel went to more than 140 $. Oil prices could be influenced by many reasons depending on decisions from petrol independent countries such as Russia and private oil-producing firms like ExxonMobil. According to “The Organization of Petroleum Exporting Countries (OPEC)”, the role of oil exporting countries will be great in oil price fluctuations. The supply and demand also influence the oil rates in the world. Natural disasters that could disturb production, and political conflicts in oil-producing countries all influence pricing.
Fig 6: Simple Line Plot between date and oil_price (on left) and date and share price (on right)
Fig 7: Pairplot on BP share price from years 2000 to 2017 using a color gradient for different years
In this instance, Figure 7's Pairplot displays the comparison between share price and oil price together with a timeline analysis spanning the years 2000 and 2017. The pair plot reveals the pairwise correlations between the features dataset and each variable's univariate data distribution. We looked into how these factors related to one another.
Fig 8: Pairplot on BP share price for last 5 years using a color gradient
This analysis was performed on all oil companies’ data in a dataset, where the oil price and share price details were analyzed for 18 years and the last 5 years also. In the last five years, the distribution gives us that the oil price has been frequently fluctuating between 30 to 60 USD/bbl and 100 to 120 USD/bbl. Between 60 and 100 USD/bbl, we could notice there is not much data. There is a positive correlation with high confidence for 2016/17 of share price and oil price. Here the year, when all values changed for this company was 2014, we will see later if that's the case for other companies. In 2014, there was a change in pattern correlations and high changeability of data. We noticed the differences between the two worlds, two market behaviors, and a change in management regardless of the independent events of the company. Several understandings can be gained from this plot.
Fig 9: Violin plot of the share price of several oil & gas companies
Based on the aforementioned study, we have seen how each company's stock is affected by the price of oil. A few violin plots will be constructed. A box plot with probability density data is somewhat similar to a violin plot. Although the variations in stock price range and distribution for many firms was distinct, the 2014 oil price's wide range was. Although several of the mentioned firms have been identified as typical variants, only few are particularly sensitive. Stock prices are scaled between 0 and 1 using their maximum and minimum values during the past 20 years, roughly, which might result in incorrect interpretations.
Unsupervised Learning - Cluster analysis on oil companies:
As we understand, unsupervised learning has only input data and no corresponding output variables. Mainly the algorithm is used to model the structure or distribution of the data. When it comes to unsupervised learning problems, commonly we use clustering, to discover groupings of data or find patterns. A possible application of this algorithm would be to assess the comparative value of the share associated with the oil price. Thus, this analysis could provide a sign, that the share is overpriced or undervalued.
The following oil firms were chosen for analysis: REPSOL (REP.MC), BP (BP.L), Cairn Energy (CNE.L), Premier Oil (PMO.L), Statoil (STL.OL), TOTAL (FP.PA), ENGIE (ENGI.PA), Schlumberger (SLB.PA), and ENGIE. The main focus of the investigation was establishing a connection between the price of oil and stock prices. In order to determine the number of clusters, we created a scree plot. Based on the need, we chose 6 clusters and moved on.
Fig 10: K-Means clustering for all companies, comparing oil price and share price.
The clusters were compared with the oil price and share price, we can notice cluster 0 in light blue, data points were starting from very low prices, but the share price is not much influenced by the oil price. Whereas cluster 1 in yellow color, has high oil prices, but a moderate share price.
Fig 11: Scree plot for finding number of clusters and Scatter plot for representing the clusters of BP Company, comparing oil price and share price.
Most often, cluster 3 also experiences increased oil prices, although this has little effect on share prices. Perhaps cluster 4, where we can see that share prices were quite high, is the greatest period to sell shares. Oil prices had no noticeable effect on share prices during this time period in this specific corporation. In the overall study, Cluster 5 is in a neutral position where neither the oil prices nor the stock prices have significantly altered. Similar to what we did above with the BP data, we may conduct studies for all firms.
Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore
Data Science Placement Success Story
Data Science Training Institutes in Other Locations
Agra, Ahmedabad, Amritsar, Anand, Anantapur, Bangalore, Bhopal, Bhubaneswar, Chengalpattu, Chennai, Cochin, Dehradun, Malaysia, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Gwalior, Hebbal, Hyderabad, Jabalpur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Khammam, Kolhapur, Kothrud, Ludhiana, Madurai, Meerut, Mohali, Moradabad, Noida, Pimpri, Pondicherry, Pune, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thane, Thiruvananthapuram, Tiruchchirappalli, Trichur, Udaipur, Yelahanka, Andhra Pradesh, Anna Nagar, Bhilai, Borivali, Calicut, Chandigarh, Chromepet, Coimbatore, Dilsukhnagar, ECIL, Faridabad, Greater Warangal, Guduvanchery, Guntur, Gurgaon, Guwahati, Hoodi, Indore, Jaipur, Kalaburagi, Kanpur, Kharadi, Kochi, Kolkata, Kompally, Lucknow, Mangalore, Mumbai, Mysore, Nagpur, Nashik, Navi Mumbai, Patna, Porur, Raipur, Salem, Surat, Thoraipakkam, Trichy, Uppal, Vadodara, Varanasi, Vijayawada, Visakhapatnam, Tirunelveli, Aurangabad
Data Analyst Courses in Other Locations
ECIL, Jaipur, Pune, Gurgaon, Salem, Surat, Agra, Ahmedabad, Amritsar, Anand, Anantapur, Andhra Pradesh, Anna Nagar, Aurangabad, Bhilai, Bhopal, Bhubaneswar, Borivali, Calicut, Cochin, Chengalpattu , Dehradun, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Guduvanchery, Gwalior, Hebbal, Hoodi , Indore, Jabalpur, Jaipur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Kanpur, Khammam, Kochi, Kolhapur, Kolkata, Kothrud, Ludhiana, Madurai, Mangalore, Meerut, Mohali, Moradabad, Pimpri, Pondicherry, Porur, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thoraipakkam , Tiruchirappalli, Tirunelveli, Trichur, Trichy, Udaipur, Vijayawada, Vizag, Warangal, Chennai, Coimbatore, Delhi, Dilsukhnagar, Hyderabad, Kalyan, Nagpur, Noida, Thane, Thiruvananthapuram, Uppal, Kompally, Bangalore, Chandigarh, Chromepet, Faridabad, Guntur, Guwahati, Kharadi, Lucknow, Mumbai, Mysore, Nashik, Navi Mumbai, Patna, Pune, Raipur, Vadodara, Varanasi, Yelahanka
Navigate to Address
360DigiTMG - Data Science, IR 4.0, AI, Machine Learning Training in Malaysia
Level 16, 1 Sentral, Jalan Stesen Sentral 5, Kuala Lumpur Sentral, 50470 Kuala Lumpur, Wilayah Persekutuan Kuala Lumpur, Malaysia
+60 19-383 1378