Time-Series Analysis and Forecasting with Meta’s Prophet Library

Yayınlandı:

In the era of big data, predicting future trends and patterns becomes more crucial and develops into a more important aspect for data scientists and data analysts. With the increasing effect of understanding the effect of data-driven business decisions, tools like Meta’s Prophet library for Python started to get more attention for time series forecasting. Prophet’s intuitive interface and robust algorithm enable users to harness the power of historical data to make informed predictions about the future.

In this writing, I will try to emphasize some key points in Prophet that will help you to customize your time-series forecasting.

Getting Started with Prophet

Before diving into time series forecasting with Prophet, it’s essential to understand the prerequisites. Prophet requires two specific columns as input data. ‘ds’ and ‘y’ columns should be included in the data and these two columns represent the date and the variable that will be forecasted. Also, I need to note that at least 18 months of historical data are needed to make forecasts in the prophet library. Otherwise, an error will occur that states it needs more data points. Let’s explore how to forecast traffic on an e-commerce website as an example.

1- Importing Libraries

To begin, import pandas and prophet libraries.

import pandas as pd
from prophet import Prophet

2- Exploring the Data

As a next step, let’s take a look at our dataset. We need to be ensured about that we have the required date and sessions data in our dataset.

df = pd.read_csv('SessionsData.csv')
df.head()

We need to rename the columns as ‘ds’ and ‘y’ before running the Prophet functions.

df.rename(columns={'Date': 'ds', 'Sessions': 'y'}, inplace=True) # Renaming the columns
df

3- Customising the Forecast

Prophet utilizes a linear model for its forecasts. However, linear models have a significant drawback, because of their formulation, they do not have constraints on their minimum and maximum points. Without customization, you may see that your website traffic is skyrocketing to very unrealistic points without getting saturated or decreasing to negative points, which is impossible. To address this issue, we need to use ‘cap’ and ‘floor’ parameters.

In our example scenario, let’s consider that we are going to launch Google Ads campaigns after the first of February and we will expect that our traffic will increase. Our past experiences say that we will have at least 20,000 sessions per day according to our budget. To enforce this minimum threshold, we set the ‘floor’ parameter to 20,000. Additionally, to use the saturating minimum (floor point), Prophet needs to use a ‘cap’ column which will indicate the maximum achievable point. For this case, we will set the ‘cap’ parameter to 50,000.

df['cap'] = 50000 # Setting cap to 50,000
df['floor'] = 15000 # Setting floor to 15,000
m = Prophet(growth='logistic') # Creating the model that indicates there will be a logistic growth.
m.fit(df) # Fitting our data to the model

As an output the code above, we created our first Prophet object. Now, we need to create a new dataset that will include our forecasts by using our model that we fitted before.

4- Creating the Future Dataset

future = m.make_future_dataframe(periods=365) # Creating the future dataframe, periods is number of days that you want to forecast
future.tail() # Let's take a look to our future dataframe

Upon inspection of our future dataset, everything appears in order, with the necessary dates for our predictions included. Notably, the final date in our dataset is January 13, 2024. With a clear timeline established, we’re poised to forecast the next 365 days with confidence.

Now, we need to set the ‘cap’ and ‘floor’ parameters. Also, we need to use predict future on our future dataset. As a concluding step, we can look into statistical values derived from our future dataset.

future['cap'] = 50000 # Setting the cap parameter to on future dataset
future['floor'] = 15000  # Setting the floor parameter to on future dataset
future.loc[(future['ds'] >= '2024-02-01') & (future['ds'] <= '2025-12-01'), 'floor'] = 20000
forecast = m.predict(future) # Predicting the sessions on our future dataset
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

After calling the predict function, prophet creates 20 different columns, each holding different significance. Yet, for our analysis, we will focus on three key columns: ‘yhat’, ‘yhat_lower’, ‘yhat_upper’. These columns provide the predicted values and their respective ranges. For instance, the model predicts that the number of sessions on our website to reach 20,222 in 2025–01–12 and it gives a range between 16,415 sessions to 23,746 sessions.

To gain a deeper understanding of our dataset, we need to visualize these values. Additionally, I decided to add change points to my graph, as they effectively highlight shifts in trends. These change points will be placed in the first 80% of the time series. This percentage can be modified with the ‘changepoint_range’ parameter while creating our m variable with Prophet() function. This addition can be very beneficial while extracting insights from the data.

5- Visualizing Data

from prophet.plot import add_changepoints_to_plot
fig1 = m.plot(forecast, figsize=(16,9))
a = add_changepoints_to_plot(fig1.gca(),m, forecast)

If you want to compare the predicted values with actual values, you can apply the code below for more interactive graph.

from prophet.plot import plot_plotly, plot_components_plotly
plot_plotly(m, forecast, figsize=(16,12))

To demonstrate the more detailed information about time-series decomposition, you can apply the code below and see the results.

fig2 = m.plot_components(forecast)

From the graphs above, multiple insights can be derived including the decreasing traffic of the website. It can be seen that something that happened in April 2023 (Google Core Algorithm Update), reversed the traffic, and the SEO department need to change some strategies. Also, monthly seasonality can be very beneficial when deciding on marketing campaigns.

Conclusion

In the landscape of data-driven decision-making, time-series analysis and forecasting play very significant roles, especially in the era of big data where forecasting future trends holds very high significance for data scientists and analysts. Meta’s Prophet library emerges as a distinct tool in this topic, because of its intuitive interface and robust algorithms that empower users to leverage historical data for informed predictions. Throughout this exploration, we’ve delved into key functionalities of Prophet, from prerequisite understanding to practical implementation. By customizing forecasts with parameters like ‘cap’ and ‘floor’, we ensure realistic predictions tailored to specific business constraints. As we navigate through the process of forecasting website traffic and visualizing the data, we uncover insights that guide strategic decision-making.

Orphex provides importing your cookieless data and creates several dashboards to interpret your data accurately. It also offers a customised time-series forecasting dashboard for your traffic data and it’s subdivisions.