Do you want to have some fun scraping COVID-19 data from the web, working with R on the Apache Spark Cluster, with Delta-Lake support? Then read this short article.
The main objective of this article is to demonstrate the Integration of R with Apache Spark and Data-Lake operations for web-scraping, data refinery, and transactional storage.
At the time of writing, the technology stack used consisted of
Git Repo with the Jupyter Notebook.
The public COVID-19 data used in this article was scraped from Worldometer In particular, we used…
Data Architect, Data Engineer, R user, Developer