<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Time series | Modeling with R and Python</title>
    <link>https://www.metalesaek.com/tag/time-series/</link>
      <atom:link href="https://www.metalesaek.com/tag/time-series/index.xml" rel="self" type="application/rss+xml" />
    <description>Time series</description>
    <generator>Source Themes Academic (https://sourcethemes.com/academic/)</generator><language>en-us</language><lastBuildDate>Tue, 05 May 2020 00:00:00 +0000</lastBuildDate>
    <image>
      <url>https://www.metalesaek.com/images/icon_hu0b7a4cb9992c9ac0e91bd28ffd38dd00_9727_512x512_fill_lanczos_center_2.png</url>
      <title>Time series</title>
      <link>https://www.metalesaek.com/tag/time-series/</link>
    </image>
    
    <item>
      <title>Time series with ARIMA and RNN models</title>
      <link>https://www.metalesaek.com/courses/rnn/time-series-with-recurrent-neaural-network-rnn-lstm-model/</link>
      <pubDate>Tue, 05 May 2020 00:00:00 +0000</pubDate>
      <guid>https://www.metalesaek.com/courses/rnn/time-series-with-recurrent-neaural-network-rnn-lstm-model/</guid>
      <description>
&lt;script src=&#34;https://www.metalesaek.com/rmarkdown-libs/header-attrs/header-attrs.js&#34;&gt;&lt;/script&gt;

&lt;div id=&#34;TOC&#34;&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#introduction&#34;&gt;&lt;span class=&#34;toc-section-number&#34;&gt;1&lt;/span&gt; Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#data-preparation&#34;&gt;&lt;span class=&#34;toc-section-number&#34;&gt;2&lt;/span&gt; Data preparation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#arima-model&#34;&gt;&lt;span class=&#34;toc-section-number&#34;&gt;3&lt;/span&gt; ARIMA model&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#rnn-model&#34;&gt;&lt;span class=&#34;toc-section-number&#34;&gt;4&lt;/span&gt; RNN model&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#reshape-the-time-series&#34;&gt;&lt;span class=&#34;toc-section-number&#34;&gt;4.0.1&lt;/span&gt; Reshape the time series&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#model-architecture&#34;&gt;&lt;span class=&#34;toc-section-number&#34;&gt;4.1&lt;/span&gt; Model architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#model-training&#34;&gt;&lt;span class=&#34;toc-section-number&#34;&gt;4.2&lt;/span&gt; Model training&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#prediction&#34;&gt;&lt;span class=&#34;toc-section-number&#34;&gt;4.3&lt;/span&gt; Prediction&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#results-comparison&#34;&gt;&lt;span class=&#34;toc-section-number&#34;&gt;5&lt;/span&gt; results comparison&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#conclusion&#34;&gt;&lt;span class=&#34;toc-section-number&#34;&gt;6&lt;/span&gt; Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#further-reading&#34;&gt;&lt;span class=&#34;toc-section-number&#34;&gt;7&lt;/span&gt; Further reading&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#session-info&#34;&gt;&lt;span class=&#34;toc-section-number&#34;&gt;8&lt;/span&gt; Session info&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;

&lt;style type=&#34;text/css&#34;&gt;
strong {
  color: Navy;
}

h1,h2, h3, h4 {
  font-size:28px;
  color:DarkBlue;
}
&lt;/style&gt;
&lt;div id=&#34;introduction&#34; class=&#34;section level1&#34; number=&#34;1&#34;&gt;
&lt;h1&gt;&lt;span class=&#34;header-section-number&#34;&gt;1&lt;/span&gt; Introduction&lt;/h1&gt;
&lt;p&gt;The classical methods for predicting univariate time series are &lt;a href=&#34;https://otexts.com/fpp2/arima.html&#34;&gt;ARIMA&lt;/a&gt; models (under linearity assumption and provided that the non stationarity is of type DS) that use the autocorrelation function (up to some order) to predict the target variable based on its own past values (Autoregressive part) and the past values of the errors (moving average part) in a linear function . However, the hardest step in ARIMA models is to derive stationary series from non stationary series that exhibits less well defined trend (deterministic or stochastic) or seasonality. The RNN model, proposed by John Hopfield (1982), is a deep learning model that does not need the above requirements (the type of non stationarity and linearity) and can capture and model the memory of the time series, which is the main characteristic of some type of sequence data, in addition to time series, such as &lt;strong&gt;text data&lt;/strong&gt;, &lt;strong&gt;image captioning&lt;/strong&gt;, &lt;strong&gt;speech recognition&lt;/strong&gt; .. etc.&lt;/p&gt;
&lt;p&gt;The basic idea behind RNN is very simple (As described in the plot below). At each time step &lt;strong&gt;t&lt;/strong&gt; the model compute a state value &lt;span class=&#34;math inline&#34;&gt;\(h_t\)&lt;/span&gt; that combines (in linear combination) the previous state &lt;span class=&#34;math inline&#34;&gt;\(h_{t-1}\)&lt;/span&gt; (which contains all the memory available at time &lt;strong&gt;t-1&lt;/strong&gt; ) and the current input &lt;span class=&#34;math inline&#34;&gt;\(x_t\)&lt;/span&gt; (which is the current value of the time series), passing then the result to the activation function &lt;strong&gt;tanh&lt;/strong&gt; (to capture any nonlinearity relations). The state at each time step t thus can formally be expressed as follows:&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[h_t=tanh(W_h.h_{t-1}+W_x.x_t+b)\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;And then we leave the work to the gradient descent to decide how much memory we keep by computing the optimum weights &lt;span class=&#34;math inline&#34;&gt;\(W_h\)&lt;/span&gt;.
Similarely, the output &lt;span class=&#34;math display&#34;&gt;\[y_t\]&lt;/span&gt; will be computed by the following:
&lt;span class=&#34;math display&#34;&gt;\[y_t=W_y.h_t\]&lt;/span&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;img &amp;lt;- EBImage::readImage(&amp;quot;C://Users/dell/Documents/new-blog/content/courses/rnn/rnn_plot.jpg&amp;quot;)
plot(img)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://www.metalesaek.com/courses/rnn/2020-05-05-time-series-with-recurrent-neaural-network-rnn-lstm-model_files/figure-html/unnamed-chunk-2-1.svg&#34; width=&#34;576&#34; /&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;data-preparation&#34; class=&#34;section level1&#34; number=&#34;2&#34;&gt;
&lt;h1&gt;&lt;span class=&#34;header-section-number&#34;&gt;2&lt;/span&gt; Data preparation&lt;/h1&gt;
&lt;p&gt;First let’s call the packages needed for our analysis&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ssh &amp;lt;- suppressPackageStartupMessages
ssh(library(timeSeries))
ssh(library(tseries))
ssh(library(aTSA))
ssh(library(forecast))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: package &amp;#39;forecast&amp;#39; was built under R version 4.0.2&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ssh(library(rugarch))&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning: package &amp;#39;rugarch&amp;#39; was built under R version 4.0.2&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;ssh(library(ModelMetrics))
ssh(library(keras))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In this article we will use the data &lt;strong&gt;USDCHF&lt;/strong&gt; from the &lt;strong&gt;timeSeries&lt;/strong&gt; package which is the univariate series of the intraday foreign exchange rates between US dollar and Swiss franc with &lt;strong&gt;62496&lt;/strong&gt; observations.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;data(USDCHF)
length(USDCHF)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 62496&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Let’s look at this data by the following plot after converting it to ts object.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;data(USDCHF)
data &amp;lt;- ts(USDCHF, frequency = 365)
plot(data)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://www.metalesaek.com/courses/rnn/2020-05-05-time-series-with-recurrent-neaural-network-rnn-lstm-model_files/figure-html/unnamed-chunk-5-1.svg&#34; width=&#34;576&#34; /&gt;&lt;/p&gt;
&lt;p&gt;This series seems to have a trend and it is not stationary, but let’s verify this by the &lt;a href=&#34;https://faculty.washington.edu/ezivot/econ584/notes/unitroot.pdf&#34;&gt;dickey fuller&lt;/a&gt; and &lt;a href=&#34;https://faculty.washington.edu/ezivot/econ584/notes/unitroot.pdf&#34;&gt;philip perron&lt;/a&gt; tests&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;adf.test(data)
pp.test(data )&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Both tests confirm that the data has unit roots(high p-value: we do not reject the null hypothesis). We can also check the correlogram of the autocorrelation function
&lt;a href=&#34;https://towardsdatascience.com/significance-of-acf-and-pacf-plots-in-time-series-analysis-2fa11a5d10a8&#34;&gt;acf&lt;/a&gt; and the Partial autocorrelation function &lt;a href=&#34;https://towardsdatascience.com/significance-of-acf-and-pacf-plots-in-time-series-analysis-2fa11a5d10a8&#34;&gt;pacf&lt;/a&gt; as follows:&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;acf(data)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://www.metalesaek.com/courses/rnn/2020-05-05-time-series-with-recurrent-neaural-network-rnn-lstm-model_files/figure-html/unnamed-chunk-7-1.svg&#34; width=&#34;576&#34; /&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pacf(data)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://www.metalesaek.com/courses/rnn/2020-05-05-time-series-with-recurrent-neaural-network-rnn-lstm-model_files/figure-html/unnamed-chunk-7-2.svg&#34; width=&#34;576&#34; /&gt;&lt;/p&gt;
&lt;p&gt;As you know the ACF is related to the MA part and PACF to the AR part, so since in the pacf we have one bar that exceeds far away the confidence interval we are confident that our data has unit root and we can get ride of it by differencing the data by one. In ARIMA terms the data should be integrated by 1 (d=1), and this the &lt;strong&gt;I&lt;/strong&gt; part of arima. In addition, since we do not have a decay of bars in PACF, the model would not have any lag included in the AR part.
Whereas, from the ACF plot, all the bars are highly far from the confidence interval then the model would have many lags of MA part.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;arima-model&#34; class=&#34;section level1&#34; number=&#34;3&#34;&gt;
&lt;h1&gt;&lt;span class=&#34;header-section-number&#34;&gt;3&lt;/span&gt; ARIMA model&lt;/h1&gt;
&lt;p&gt;To fit an ARIMA model we have to determine the lag of the AR (p) and MA(q) components and how many times we integrate the series to be stationary (d). Fortunately, we do not have to worry about these issues, we leave everything to the &lt;strong&gt;forcast&lt;/strong&gt; package that provides a fast way to get the best model by calling the function &lt;strong&gt;auto.arima&lt;/strong&gt;. But before that let’s held out the last 100 observations to be used as testing data in order to compare the quality of this model and the RNN model.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;data_test &amp;lt;- data[(length(data)-99):length(data)]
data_train &amp;lt;- data[1:(length(data)-99-1)]&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model_arima &amp;lt;- auto.arima(data_train)
summary(model_arima)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Series: data_train 
## ARIMA(0,1,2) with drift 
## 
## Coefficients:
##           ma1     ma2  drift
##       -0.0193  0.0113      0
## s.e.   0.0040  0.0040      0
## 
## sigma^2 estimated as 2.29e-06:  log likelihood=316634.5
## AIC=-633260.9   AICc=-633260.9   BIC=-633224.8
## 
## Training set error measures:
##                        ME        RMSE          MAE           MPE       MAPE
## Training set 1.900607e-08 0.001513064 0.0009922846 -3.671242e-05 0.06627114
##                  MASE          ACF1
## Training set 0.999585 -3.921999e-05&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As expected this model is an ARIMA(0,1,2) integrated by 1 (differenced series is now stationary) and has two MA lags without &lt;strong&gt;drift&lt;/strong&gt; (constant). The output also has some metric values like Root mean square error &lt;strong&gt;RMSE&lt;/strong&gt; and mean absolute error &lt;strong&gt;MAE&lt;/strong&gt; which are the most popular ones. we will use later on this metric to compare this model with the RNN model.
To validate this model we have to make sure that the residuals are white noise without any problems such as autocorrelation or &lt;a href=&#34;https://www.investopedia.com/terms/h/heteroskedasticity.asp&#34;&gt;heterskedasticity&lt;/a&gt;. Thankfully to &lt;strong&gt;forecast&lt;/strong&gt; package we can check the residual straightforwardly by calling the function &lt;strong&gt;checkresiduals&lt;/strong&gt;&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;checkresiduals(model_arima)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://www.metalesaek.com/courses/rnn/2020-05-05-time-series-with-recurrent-neaural-network-rnn-lstm-model_files/figure-html/unnamed-chunk-10-1.svg&#34; width=&#34;576&#34; /&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,2) with drift
## Q* = 8.6631, df = 7, p-value = 0.2778
## 
## Model df: 3.   Total lags used: 10&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Since the p-value is far larger than the significance level 5% we do not reject the null hypothesis that the errors are not autocorrelated. However, by looking at the ACF plot we have some bars that go outside the confidence interval, but this can be expected by the significance level of 5% (as false positive). So we can confirm the non correlation with 95% of confidence.
For possible heteroskedasticity we use &lt;a href=&#34;https://hal.archives-ouvertes.fr/hal-00588680/document&#34;&gt;ARCH_LM&lt;/a&gt; statistic from the package &lt;strong&gt;aTSA&lt;/strong&gt; package.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;arch.test(arima(data_train, order = c(0,1,2)))&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://www.metalesaek.com/courses/rnn/2020-05-05-time-series-with-recurrent-neaural-network-rnn-lstm-model_files/figure-html/unnamed-chunk-11-1.svg&#34; width=&#34;576&#34; /&gt;&lt;/p&gt;
&lt;p&gt;We see that both test are highly significant (we reject the null hypothesis of homoskedasticity), so the above arima model is not able to capture such pattern. That is why we should join to the above model another model that keeps track of this type of patterns which is called &lt;a href=&#34;https://medium.com/auquan/time-series-analysis-for-finance-arch-garch-models-822f87f1d755&#34;&gt;GARCH&lt;/a&gt; model.
The garch model attempts to model the residuals of the ARIMA model with the general following formula:
&lt;span class=&#34;math display&#34;&gt;\[\epsilon_t=w_t\sqrt{h_t}\]&lt;/span&gt;
&lt;span class=&#34;math display&#34;&gt;\[h_t=w_t\sqrt{a_0+\sum_{i=1}^{p}a_i.\epsilon_{t-i}^2+\sum_{j=1}^{q}b_j.h_{t-j}}\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Where &lt;span class=&#34;math inline&#34;&gt;\(w_t\)&lt;/span&gt; is white noise error.&lt;/p&gt;
&lt;p&gt;So we fit this model for different lags by calling the function &lt;strong&gt;garch&lt;/strong&gt; from the package &lt;strong&gt;tseries&lt;/strong&gt;, and we use the &lt;strong&gt;AIC&lt;/strong&gt; criterion to get the best model.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model &amp;lt;- character()
AIC &amp;lt;- numeric()
for (p in 1:5){
  for(q in 1:5){
    model_g &amp;lt;- tseries::garch(model_arima$residuals, order = c(p,q), trace=F)
    model&amp;lt;-c(model,paste(&amp;quot;mod_&amp;quot;, p, q))
    AIC &amp;lt;- c(AIC, AIC(model_g))
    def &amp;lt;- tibble::tibble(model,AIC)
  }
}&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information

## Warning in tseries::garch(model_arima$residuals, order = c(p, q), trace = F):
## singular information&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;def %&amp;gt;% dplyr::arrange(AIC)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 25 x 2
##    model         AIC
##    &amp;lt;chr&amp;gt;       &amp;lt;dbl&amp;gt;
##  1 mod_ 1 1 -647018.
##  2 mod_ 2 1 -647005.
##  3 mod_ 1 2 -647005.
##  4 mod_ 2 3 -646986.
##  5 mod_ 1 3 -646971.
##  6 mod_ 1 4 -646967.
##  7 mod_ 2 2 -646900.
##  8 mod_ 3 3 -646885.
##  9 mod_ 3 1 -646859.
## 10 mod_ 1 5 -646859.
## # ... with 15 more rows&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As we see the simpler model with one lag for each component fit well the residuals
we can check the residuals of this model with box test.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model_garch &amp;lt;- tseries::garch(model_arima$residuals, order = c(1,1), trace=F)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Warning in tseries::garch(model_arima$residuals, order = c(1, 1), trace = F):
## singular information&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Box.test(model_garch$residuals)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## 
##  Box-Pierce test
## 
## data:  model_garch$residuals
## X-squared = 3.1269, df = 1, p-value = 0.07701&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With significance level of 5% we do not reject the null hypothesis of independence.
As an alternative we can inspect the acf of the residuals.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;acf(model_garch$residuals[-1])&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;img src=&#34;https://www.metalesaek.com/courses/rnn/2020-05-05-time-series-with-recurrent-neaural-network-rnn-lstm-model_files/figure-html/unnamed-chunk-14-1.svg&#34; width=&#34;576&#34; /&gt;&lt;/p&gt;
&lt;p&gt;The easiest way to get prediction from our model is by making use of the &lt;strong&gt;rugarch&lt;/strong&gt; package. First, we specify the model with the parameters obtained above (the different lags)&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# garch1 &amp;lt;- ugarchspec(mean.model = list(armaOrder = c(0,2), include.mean = FALSE), 
# variance.model = list(garchOrder = c(1,1))) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Then we use the function &lt;strong&gt;ugarchfit&lt;/strong&gt; to predict our data_train. However, you might noticed that we supplied only the lags of the AR and MA parts of our ARIMA model (the d value for integration is not available in this function), so we should provide the differenced series of &lt;strong&gt;data_train&lt;/strong&gt; instead of the original series.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;Ddata_train &amp;lt;- diff(data_train)
# garchfit &amp;lt;- ugarchfit(data=Ddata_train, spec = garch1, solver = &amp;quot;gosolnp&amp;quot;,trace=F)
# coef(garchfit)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Our final model will be written as follows.&lt;/p&gt;
&lt;p&gt;&lt;span class=&#34;math display&#34;&gt;\[y_t=e_t-4.296.10^{-2}e_{t-1}+5.687.10^{-3}e_{t-2} \\
e_t\sim N(0,\hat\sigma_t^2) \\
\hat\sigma_t^2=1.950.10^{-7}+2.565.10^{-1}e_{t-1}^2+6.940.10^{-1}\hat\sigma_{t-1}^2\]&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;: when running the above model we get different results due to the internal randomization process, that is why i commented the above code to prevent it to be rerun again when rendering this document.&lt;/p&gt;
&lt;p&gt;Now we use this model for forecasting 100 future values to be compared then with the data_test values.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;# fitted &amp;lt;- ugarchforecast(garchfit, n.ahead=100)
#yh_test&amp;lt;-numeric()
#for (i in 2:100){
#  yh_test[1] &amp;lt;- data_train[length(data_train)]+fitted(fitted)[1]
#  yh_test[i] &amp;lt;- yh_test[i-1]+fitted(fitted)[i]
#}
#df_eval &amp;lt;- tibble::tibble(y_test=data_test, yh_test=yh_test)
#df_eval&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally we should save the &lt;strong&gt;df_eval&lt;/strong&gt; table with the original and the fitted values of the data_test for further use.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#write.csv(df_eval, &amp;quot;df_eval.csv&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;rnn-model&#34; class=&#34;section level1&#34; number=&#34;4&#34;&gt;
&lt;h1&gt;&lt;span class=&#34;header-section-number&#34;&gt;4&lt;/span&gt; RNN model&lt;/h1&gt;
&lt;p&gt;As an alternative to ARIMA prediction method discussed above, the deep learning RNN method can also take into account the memory of the time series. Unlike the classical feedforward networks that process each single input independently, the RNN takes a bunch of inputs that supposed to be in one sequence and process them together as showed in the first plot. In keras this step can be achieved by &lt;strong&gt;layer_simple_rnn&lt;/strong&gt; (Chollet, 2017, p167].
This means we have to decide the length of the sequence, in other words how far back we think that the current value is depending on (the memory of the time series). In our case we think that 7 days values should be satisfactory to predict the current value.&lt;/p&gt;
&lt;div id=&#34;reshape-the-time-series&#34; class=&#34;section level3&#34; number=&#34;4.0.1&#34;&gt;
&lt;h3&gt;&lt;span class=&#34;header-section-number&#34;&gt;4.0.1&lt;/span&gt; Reshape the time series&lt;/h3&gt;
&lt;p&gt;The first thing we do is organizing the data in such way that the model knows what part is considered as sequences to be processed by the rnn layer, and what part is the target variable. To do so we reorganize the time series into a matrix where each row is a single input , and the columns contain the lagged values (of the target variable) up to 7 and the target variable in the last column. Consequently, The total number of rows will be the &lt;strong&gt;length(data)-maxlen-1&lt;/strong&gt;, where maxlen refers to the length of each sequences (constant) which here is equal to 7.&lt;/p&gt;
&lt;p&gt;Let’s first create an empty matrix&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;maxlen &amp;lt;- 7
exch_matrix&amp;lt;- matrix(0, nrow = length(data_train)-maxlen-1, ncol = maxlen+1) &lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now let’s move our time series to this matrix and display some rows to be sure that the output is as expected to be.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;for(i in 1:(length(data_train)-maxlen-1)){
  exch_matrix[i,] &amp;lt;- data_train[i:(i+maxlen)]
}
head(exch_matrix)  &lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;##        [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]
## [1,] 1.1930 1.1941 1.1933 1.1931 1.1924 1.1926 1.1926 1.1932
## [2,] 1.1941 1.1933 1.1931 1.1924 1.1926 1.1926 1.1932 1.1933
## [3,] 1.1933 1.1931 1.1924 1.1926 1.1926 1.1932 1.1933 1.1932
## [4,] 1.1931 1.1924 1.1926 1.1926 1.1932 1.1933 1.1932 1.1933
## [5,] 1.1924 1.1926 1.1926 1.1932 1.1933 1.1932 1.1933 1.1934
## [6,] 1.1926 1.1926 1.1932 1.1933 1.1932 1.1933 1.1934 1.1940&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we separate the inputs from the target.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;x_train &amp;lt;- exch_matrix[, -ncol(exch_matrix)]
y_train &amp;lt;- exch_matrix[, ncol(exch_matrix)]&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The rnn layer in keras expects the inputs to be of the shape (examples, maxlen, number of features), since then we have only one feature (our single time series that is processed sequentially) the shape of the inputs should be c(examples, 7,1). However, the first dimension can be discarded and we can provide only the last ones.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;dim(x_train)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 62388     7&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As we see this shape does not include the number of features, so we can correct it as follows.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;x_train &amp;lt;- array_reshape(x_train, dim = c((length(data_train)-maxlen-1), maxlen, 1))
dim(x_train)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## [1] 62388     7     1&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;model-architecture&#34; class=&#34;section level2&#34; number=&#34;4.1&#34;&gt;
&lt;h2&gt;&lt;span class=&#34;header-section-number&#34;&gt;4.1&lt;/span&gt; Model architecture&lt;/h2&gt;
&lt;p&gt;When it comes to deep learning models, there is a large space for hyperparameters to be defined and the results are heavily depending on these hyperparameters, such as the optimal number of layers, the optimal number of nodes in each layer, the suitable activation function, the suitable loss function, the best optimizer, the best regularization techniques, the best random initialization , …etc. Unfortunately, we do not have yet an exact rule to decide about these hyperparameters, and they depend on the problem under study, the data at hand, and the experience of the modeler. In our case, for instance, our data is very simple, and, actually does not require complex architecture, we will thus use only one hidden rnn layer with 10 nodes, the loss function will be the mean square error &lt;strong&gt;mse&lt;/strong&gt; , the optimizer will be &lt;strong&gt;adam&lt;/strong&gt;, and the metric will be the mean absolute error &lt;strong&gt;mae&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt; : with large and complex time series it might be needed to stack many rnn layers.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model &amp;lt;- keras_model_sequential()
model %&amp;gt;% 
  layer_dense(input_shape = dim(x_train)[-1], units=maxlen) %&amp;gt;% 
  layer_simple_rnn(units=10) %&amp;gt;% 
  layer_dense(units = 1)
summary(model)&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## Model: &amp;quot;sequential&amp;quot;
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## dense (Dense)                       (None, 7, 7)                    14          
## ________________________________________________________________________________
## simple_rnn (SimpleRNN)              (None, 10)                      180         
## ________________________________________________________________________________
## dense_1 (Dense)                     (None, 1)                       11          
## ================================================================================
## Total params: 205
## Trainable params: 205
## Non-trainable params: 0
## ________________________________________________________________________________&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;model-training&#34; class=&#34;section level2&#34; number=&#34;4.2&#34;&gt;
&lt;h2&gt;&lt;span class=&#34;header-section-number&#34;&gt;4.2&lt;/span&gt; Model training&lt;/h2&gt;
&lt;p&gt;Now let’s compile and run the model with 5 epochs, batch_size of 32 instances at a time to update the weights, and to keep track of the model performance we held out 10% of the training data as validation set.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;model %&amp;gt;% compile(
  loss = &amp;quot;mse&amp;quot;,
  optimizer= &amp;quot;adam&amp;quot;,
  metric = &amp;quot;mae&amp;quot; 
)&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#history &amp;lt;- model %&amp;gt;% 
#  fit(x_train, y_train, epochs = 5, batch_size = 32, validation_split=0.1)&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;since each time we rerun the model we will get different results, so we should save the model (or only the model weights) and reload it again, doing so when rendering the document we will not be surprised by other outputs.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;#save_model_hdf5(model, &amp;quot;rnn_model.h5&amp;quot;)
rnn_model &amp;lt;- load_model_hdf5(&amp;quot;rnn_model.h5&amp;quot;)&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;div id=&#34;prediction&#34; class=&#34;section level2&#34; number=&#34;4.3&#34;&gt;
&lt;h2&gt;&lt;span class=&#34;header-section-number&#34;&gt;4.3&lt;/span&gt; Prediction&lt;/h2&gt;
&lt;p&gt;In order to get the prediction of the last 100 data point, we will predict the entire data then we compute the &lt;strong&gt;rmse&lt;/strong&gt; for the last 100 predictions.&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;maxlen &amp;lt;- 7
exch_matrix2&amp;lt;- matrix(0, nrow = length(data)-maxlen-1, ncol = maxlen+1) 

for(i in 1:(length(data)-maxlen-1)){
  exch_matrix2[i,] &amp;lt;- data[i:(i+maxlen)]
}

x_train2 &amp;lt;- exch_matrix2[, -ncol(exch_matrix2)]
y_train2 &amp;lt;- exch_matrix2[, ncol(exch_matrix2)]

x_train2 &amp;lt;- array_reshape(x_train2, dim = c((length(data)-maxlen-1), maxlen, 1))&lt;/code&gt;&lt;/pre&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;pred &amp;lt;- rnn_model %&amp;gt;% predict(x_train2)
df_eval_rnn &amp;lt;- tibble::tibble(y_rnn=y_train2[(length(y_train2)-99):length(y_train2)],
                          yhat_rnn=as.vector(pred)[(length(y_train2)-99):length(y_train2)])&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div id=&#34;results-comparison&#34; class=&#34;section level1&#34; number=&#34;5&#34;&gt;
&lt;h1&gt;&lt;span class=&#34;header-section-number&#34;&gt;5&lt;/span&gt; results comparison&lt;/h1&gt;
&lt;p&gt;we can now compare the prediction of the last 100 data points using this model with the predicted values for the same data points using the ARIMA model. We first load the above data predicted with ARIMA model and join every thing in one data frame, then we use two metrics to compare, &lt;strong&gt;rmse&lt;/strong&gt;, &lt;strong&gt;mae&lt;/strong&gt; which are easily available in &lt;strong&gt;ModelMetrics&lt;/strong&gt; package.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: You might want to ask why we only use 100 data points for predictions where usually, in machine learning, we use a large number sometimes 20% of the entire data. The answer is because of the nature of the ARIMA models which are a short term prediction models, especially with financial data that are characterized by the high and instable volatility (that is why we use garch model above).&lt;/p&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;df_eval &amp;lt;- read.csv(&amp;quot;df_eval.csv&amp;quot;)
rmse &amp;lt;- c(rmse(df_eval$y_test, df_eval$yh_test), 
          rmse(df_eval_rnn$y_rnn, df_eval_rnn$yhat_rnn) )
mae &amp;lt;- c(mae(df_eval$y_test, df_eval$yh_test), 
          mae(df_eval_rnn$y_rnn, df_eval_rnn$yhat_rnn) )
df &amp;lt;- tibble::tibble(model=c(&amp;quot;ARIMA&amp;quot;, &amp;quot;RNN&amp;quot;), rmse, mae)
df&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## # A tibble: 2 x 3
##   model    rmse     mae
##   &amp;lt;chr&amp;gt;   &amp;lt;dbl&amp;gt;   &amp;lt;dbl&amp;gt;
## 1 ARIMA 0.00563 0.00388
## 2 RNN   0.00442 0.00401&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As we see, The two models are closer to each other. However, if we use the &lt;strong&gt;rmse&lt;/strong&gt;, which is the popular metrics used with continuous variables the rnn model is better, but with &lt;strong&gt;mae&lt;/strong&gt; they are approximately the same.&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;conclusion&#34; class=&#34;section level1&#34; number=&#34;6&#34;&gt;
&lt;h1&gt;&lt;span class=&#34;header-section-number&#34;&gt;6&lt;/span&gt; Conclusion&lt;/h1&gt;
&lt;p&gt;Even though this data is very simple and does not need an RNN model, and it can be predicted with the classical ARIMA models, but it is used here for pedagogic purposes to well understand how the RNN works, and how the data should be processed to be ingested by &lt;strong&gt;keras&lt;/strong&gt;. However, rnn model suffers from a major problem when running a large sequence known as &lt;strong&gt;Vanishing gradient&lt;/strong&gt; and &lt;strong&gt;exploding gradient&lt;/strong&gt;. In other words, with the former, when using the chain rule to compute the gradients, if the derivatives have small values then multiplying a large number of small values (as the length of the sequence) yields very tiny values that cause the network to be slowly trainable or even untrainable. The opposite is true when we face the latter problem, in this case we will get very large values and the network never converges.&lt;br /&gt;
Soon I will post an article with multivariate time series by implementing Long Short term memory &lt;strong&gt;LSTM&lt;/strong&gt; model that is supposed to overcome the above problems that faces simple rnn model .&lt;/p&gt;
&lt;/div&gt;
&lt;div id=&#34;further-reading&#34; class=&#34;section level1&#34; number=&#34;7&#34;&gt;
&lt;h1&gt;&lt;span class=&#34;header-section-number&#34;&gt;7&lt;/span&gt; Further reading&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;Froncois Chollet, Deep learning with R, Meap edition, 2017, p167&lt;/li&gt;
&lt;li&gt;Ian Godfollow et al, Deep Learning, &lt;a href=&#34;http://www.deeplearningbook.org/&#34; class=&#34;uri&#34;&gt;http://www.deeplearningbook.org/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;div id=&#34;session-info&#34; class=&#34;section level1&#34; number=&#34;8&#34;&gt;
&lt;h1&gt;&lt;span class=&#34;header-section-number&#34;&gt;8&lt;/span&gt; Session info&lt;/h1&gt;
&lt;pre class=&#34;r&#34;&gt;&lt;code&gt;sessionInfo()&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;## R version 4.0.1 (2020-06-06)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19041)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] keras_2.3.0.0        ModelMetrics_1.2.2.2 rugarch_1.4-4       
## [4] forecast_8.13        aTSA_3.1.2           tseries_0.10-47     
## [7] timeSeries_3062.100  timeDate_3043.102   
## 
## loaded via a namespace (and not attached):
##  [1] jsonlite_1.7.1              assertthat_0.2.1           
##  [3] TTR_0.24.2                  tiff_0.1-5                 
##  [5] yaml_2.2.1                  GeneralizedHyperbolic_0.8-4
##  [7] numDeriv_2016.8-1.1         pillar_1.4.6               
##  [9] lattice_0.20-41             reticulate_1.16            
## [11] glue_1.4.2                  quadprog_1.5-8             
## [13] DistributionUtils_0.6-0     digest_0.6.25              
## [15] colorspace_1.4-1            htmltools_0.5.0            
## [17] Matrix_1.2-18               pkgconfig_2.0.3            
## [19] bookdown_0.20               purrr_0.3.4                
## [21] fftwtools_0.9-9             mvtnorm_1.1-1              
## [23] scales_1.1.1                whisker_0.4                
## [25] jpeg_0.1-8.1                tibble_3.0.3               
## [27] farver_2.0.3                EBImage_4.30.0             
## [29] generics_0.0.2              ggplot2_3.3.2              
## [31] ellipsis_0.3.1              urca_1.3-0                 
## [33] nnet_7.3-14                 BiocGenerics_0.34.0        
## [35] cli_2.0.2                   quantmod_0.4.17            
## [37] magrittr_1.5                crayon_1.3.4               
## [39] mclust_5.4.6                evaluate_0.14              
## [41] ks_1.11.7                   fansi_0.4.1                
## [43] nlme_3.1-149                MASS_7.3-53                
## [45] xts_0.12.1                  truncnorm_1.0-8            
## [47] blogdown_0.20               tools_4.0.1                
## [49] data.table_1.13.0           lifecycle_0.2.0            
## [51] stringr_1.4.0               munsell_0.5.0              
## [53] locfit_1.5-9.4              compiler_4.0.1             
## [55] SkewHyperbolic_0.4-0        rlang_0.4.7                
## [57] grid_4.0.1                  RCurl_1.98-1.2             
## [59] nloptr_1.2.2.2              rappdirs_0.3.1             
## [61] htmlwidgets_1.5.2           Rsolnp_1.16                
## [63] labeling_0.3                base64enc_0.1-3            
## [65] spd_2.0-1                   bitops_1.0-6               
## [67] rmarkdown_2.4               gtable_0.3.0               
## [69] fracdiff_1.5-1              abind_1.4-5                
## [71] curl_4.3                    R6_2.4.1                   
## [73] tfruns_1.4                  zoo_1.8-8                  
## [75] tensorflow_2.2.0            knitr_1.30                 
## [77] dplyr_1.0.2                 utf8_1.1.4                 
## [79] zeallot_0.1.0               KernSmooth_2.23-17         
## [81] stringi_1.5.3               Rcpp_1.0.5                 
## [83] vctrs_0.3.4                 png_0.1-7                  
## [85] tidyselect_1.1.0            xfun_0.18                  
## [87] lmtest_0.9-38&lt;/code&gt;&lt;/pre&gt;
&lt;/div&gt;
</description>
    </item>
    
  </channel>
</rss>
