Take a look at some of the more advanced, modern ways of utilizing backend caching with hosted applications or workflows.
The R ecosystem provides an assortment of options and libraries for implementing caching
into one’s data-related workflows, for example:
plus many others.
However, there seems to be a slight lack in support for caching
with regard to cloud hosted workflows and web applications.
As an example, take a typical web-application architecture where the process is segmented into three separate layers:
This generalized architecture is deemed the Three-Tier Architecture and is a model generally used by app developers to create flexible and reusable applications.
Broken down, the three layers are:
Presentation Layer - the interface the end-user interacts with. Its primary duty is to translate tasks and results from servers into output that the end-user can understand. In short, this tier is the User Interface or client.
Application Layer - primary coordinator of the back-end of the application. This tier processes commands, makes logical decisions and processes data between the end user-interface and the back-end database. In short, this tier is the business logic.
Data Layer - simply put, this tier is where data is stored and accessed by the application layer. This tier composes of the various data stores needed by the application to access and retrieve data used in both previously mentioned layers, i.e. databases and cloud storage.
A benefit of the three-tier architecture is that each layer can work and be maintained, developed, and tested in isolation, independent of the other layers. Therefore every time the application queries for data, the speed is limited by network performance.
Data retrieval time plays a key role in overall user experience (UX) and is a critical requirement for most large applications meant for production.
Caching is a buffering technique that stores frequently-queried data in a temporary memory. It makes data easier to be accessed and reduces workloads for databases. For example, you need to retrieve a user’s profile from the database and you need to go from a server to server. After the first time, the user profile is stored next (or much nearer) to you. Therefore, it greatly reduces the time to read the profile when you need it again.
The cache can be set up in different tiers or on its own, depending on the use case. It works with any type of database including both Relational and NoSQL Databases.
Performance — Performance is improved by making data easier to be accessed through the cache and reduces workloads for database.
Scalability — Workload of back-end query is distributed to the cache system which is lower costs and allow more flexibility in processing of data.
Availability — If back-end database server is unavailable, cache can still provide continuous service to the application, making the system more resilient to failures.
Overall, it is the minimally invasive strategy to improve application performance by implementing caching with additional benefits of scalability and availability.
There are many implementations of caching in modern application frameworks including:
For example here is an architectural diagram of the Cache Aside workflow:
Ok so now moving onto caching in relation to R:
Recently, the shiny package introduced new functions that aid in caching: bindCache
and renderCachedPlot
.
renderCachedPlot()
requires shiny
version 1.5.0 or higher.
bindCache()
requires shiny
version 1.6.0 or higher.
Redis is the most commonly used back-end for implementing a caching layer between the application and database in production, and it is pretty awesome.
One can easily spin up a local Redis backend using docker as follows:
docker run --rm --name redisbank -d -p 6379:6379 redis:5.0.5 --requirepass bebopalula
This will run a redis container at localhost
exposed to port 6379
(Redis’ typical port).
Next, you can add a redis cache to your R workflow using a generated R6
object that represents the redis_cache
created with the bank
package in conjunction with the memoise
package as so:
Note: the redux
package will be required to implement the redis_cache
R6 object here
library(bank)
library(memoise)
<- bank::cache_redis$new(password = "bebopalula")
redis_cache
<- function(x) {
f sample(1:1000, x)
}
<- memoise::memoise(f, cache = redis_cache)
mf
mf(5)
mf(10)
and inside Shiny:
<- fluidPage(
ui # Creating a slider input that will be used as a cache key
sliderInput("nrow", "NROW", 1, 32, 32),
# Plotting a piece of mtcars
plotOutput("plot")
)
<- function(input, output, session) {
server $plot <- renderCachedPlot(
output
{# Pretending this takes a long time
Sys.sleep(2)
plot(mtcars[1:input$nrow, ])
},cacheKeyExpr = list(
# Defining the cache key
$nrow
input
),# Using our redis cache
cache = redis_cache
)
}shinyApp(ui, server)
For a more involved shiny example you could use:
generate_app(redis_cache)
Try it Out!
If you see mistakes or want to suggest changes, please create an issue on the source repository.
For attribution, please cite this work as
Briggs (2021, Dec. 11). Jim's Docs: Experimenting with Caching Backends in R. Retrieved from https://jimsdocs.jimbrig.com/posts/2021-12-11-experimenting-with-caching-backends-in-r/
BibTeX citation
@misc{briggs2021experimenting, author = {Briggs, Jimmy}, title = {Jim's Docs: Experimenting with Caching Backends in R}, url = {https://jimsdocs.jimbrig.com/posts/2021-12-11-experimenting-with-caching-backends-in-r/}, year = {2021} }