Performance Optimization in R: Parallel Computing and Rcpp

From: https://tutorial.guidotti.dev/pa78y/ The ‘parallel’ package Reference: https://bookdown.org/rdpeng/rprogdatascience/parallel-computation.html Many computations in R can be made faster by the use of parallel computation. Generally, parallel computation is the simultaneous execution of different pieces of a larger computation across multiple computing processors or cores. The parallel package can be used to send tasks (encoded as function calls) to each of the … Continue reading Performance Optimization in R: Parallel Computing and Rcpp →

Tableau-like Drag and Drop GUI Visualization in R

From: https://towardsdatascience.com/tableau-esque-drag-and-drop-gui-visualization-in-r-901ee9f2fe3f One of the the few things that Self-service Data Visualization tools like Tableau and Qlik offer that sophisticated Data Science Languages like R and Python do not offer is — The Drag and Drop GUI to create Visualizations. The flexibility with which you can simply drag and drop your Dimensions and Metrics is so … Continue reading Tableau-like Drag and Drop GUI Visualization in R →

Product Price Prediction: A Tidy Hyperparameter Tuning and Cross Validation Tutorial

From: https://www.business-science.io/code-tools/2020/01/21/hyperparamater-tune-product-price-prediction.html Product price estimation and prediction is one of the skills I teach frequently - It's a great way to analyze competitor product information, your own company's product data, and develop key insights into which product features influence product prices. Learn how to model product car prices and calculate depreciation curves using the brand new tune package for Hyperparameter Tuning Machine Learning Models. This is … Continue reading Product Price Prediction: A Tidy Hyperparameter Tuning and Cross Validation Tutorial →

Calculating the Required Sample Size for a Binomial Test in R

From: http://www.sastibe.de/2020/01/sample_size_r/ A Standard Problem: Determining Sample Size Recently, I was tasked with a straightforward question: "In an A/B test setting, how many samples do I have to collect in order to obtain significant results?" As ususal in statistics, the answer is not quite as straightforward as the question, and it depends quite a bit … Continue reading Calculating the Required Sample Size for a Binomial Test in R →

Access the free economic database DBnomics with R

From: https://macro.cepremap.fr/article/2019-10/rdbnomics-tutorial/ DBnomics : the world’s economic database Explore all the economic data from different providers (national and international statistical institutes, central banks, etc.), for free, following the link db.nomics.world. You can also retrieve all the economic data through the rdbnomics package here. This blog post describes the different ways to do so. Fetch time series by ids First, let’s assume that … Continue reading Access the free economic database DBnomics with R →

Implemetation of 17 classification algorithms in R using car evaluation data

From: https://www.datascience-zing.com/blog/implemetation-of-17-classification-algorithms-in-r-using-car-ev This data is obtained from UCI Machine learning repository. The purpose of the analysis is to evaluate the safety standard of the cars based on certain parameters and classify them. The detailed description of the dataset is provided below as given in the website: For detailed code visit my Github repository 1. Title: Car Evaluation … Continue reading Implemetation of 17 classification algorithms in R using car evaluation data →

Simulating data with Bayesian networks

From: http://gradientdescending.com/simulating-data-with-bayesian-networks/ Bayesian networks are really useful for many applications and one of those is to simulate new data. Bayes nets represent data as a probabilistic graph and from this structure it is then easy to simulate new data. This post will demonstrate how to do this with bnlearn. Fit a Bayesian network Before simulating new … Continue reading Simulating data with Bayesian networks →

A Collection of 10 Data Visualizations You Must See

From: https://www.analyticsvidhya.com/blog/2018/01/collection-data-visualizations-you-must-see/ Introduction Writing codes is fun. Creating models with them is even more intriguing. But things start getting tricky when it comes to presenting our work to a non-technical person. This is where visualizations comes in. They are one of the best ways of telling a story with data. In this article, we look … Continue reading A Collection of 10 Data Visualizations You Must See →

Top 20 National Football Teams by Goals Scored Animated Plot

From: https://github.com/basarabam/Top20AnimatedPlot Animated plot of Top 20 National football teams by goals scored made by using ggplot2, gganimate and ggflags packages (and other listed). Follow comments in the code for instructions. This is how animated plot looks like, when finished: Dataset: https://drive.google.com/open?id=1sTzf1ehIDBAUuRoUhQCUk_-S9YbXjCi93qMGoqAs7ec #Loading packages library(tidyverse) library(lubridate) library(countrycode) library(gganimate) library(ggflags) library(tidyr) library(viridis) #Reading data data_1 <- read_csv("results.csv") … Continue reading Top 20 National Football Teams by Goals Scored Animated Plot →

Construindo APIs

From: http://material.curso-r.com/api/ O fluxo da ciência de dados se encerra na parte de comunicação. Essa parte é responsável por transferir todo o conhecimento gerado durante a análise de dados. E isso não é nada fácil!! knitr::include_graphics("figures/data-science-communicate.png") A forma de construir a comunicação depende muito do interlocutor. Por exemplo, se a pessoa que vai receber a … Continue reading Construindo APIs →