Keeping track of your actual internet speed over time, using R and Speedtest-CLI
I thought this only happened in my country (Perú), but it turns out it happens to people around the globe, you contract an Internet Service Plan that offers up to X
Mbps, where X
is an impressive value (relative to each country’s context), but the catch is in that word, Up to.
If it makes you feel better, it could be worse if you lived in another country, trust me, I know... pic.twitter.com/Rm3J2J2Cdz
— Andrés Castro Socolich (@Andresrcs) January 16, 2020
In reality, you end up getting far slower download speeds than advertised, and ISPs (Internet Service Providers) get away with this because they only test download speed with you once when they install the service (suspiciously it works great at that moment) and most people don’t bother to keep track of their download speeds over time because it would be impractical to manually perform speed tests every short period of time, luckily there is a better way, Ookla’s Speedtest service provides a Command Line Interface (CLI) that allows you to perform tests from your system terminal enabling you to set up automated scripts to collect connection performance data.
In the next lines, I’m going to show you how to do this from R and get the results in the form of nice plots you can tweet to your ISP to annoy them a little bit. 😆
Getting Your Setup Ready
We have to set up our working environment for this project. First, we need a machine that is constantly powered on and connected to the same internet connection you want to test so it can perform the tests and store the data, it could be any computer connected to your local network but the most practical (and cost-effective) solution I have found is using a Raspberry Pi SBC so I’m going to use one for this project.
Since you are reading an R related article I’m going to assume you already have your basic R environment set, so we are going to focus on the additional stuff, we need to install the Speedtest Command Line Interface (CLI), if you are in a Linux machine, you can do it with this commands on a system terminal:
✏️ There are installation instructions for other OSs on the Speedtest web site
sudo apt-get install curl
curl -s https://packagecloud.io/install/repositories/ookla/speedtest-cli/script.deb.sh | sudo bash
sudo apt-get install speedtest
✏️ Check if you have an older python version of the CLI installed in your system and make sure to uninstall it with this command
sudo pip uninstall speedtes-cli
, since it could cause conflicts with the newer official version. This happened to me and I didn’t notice 😳, thanks to @jasongrahn for the heads up.
Gathering Data
Now we can write an R script that retrieves the test output, parses the content and stores the data. In this example I’m going to store it into a PostgreSQL database but you could choose other options like another SQL server, Google Sheets, a CSV file, etc. I have called this script file speedtest_job.R
.
#! /usr/bin/env Rscript
# Data adquisition
<- read.csv(
output text = system(
command = "speedtest -f csv --output-header --accept-license",
intern = TRUE
)
)
<- data.frame(
results stringsAsFactors = FALSE,
time = as.POSIXct(Sys.time()),
ip = system("curl ifconfig.me", intern = TRUE), # This field is no longer provided on the new CLI version
ping = output$latency,
download = output$download * 7.629395E-6, # Convert to Mbps
upload = output$upload * 7.629395E-6, # Convert to Mbps
isp = "movistar" # This field is not provided on CSV format
)
# Data storing
<- dbConnect(drv = odbc::odbc(),
con driver = 'PostgreSQL ANSI',
server = 'localhost',
database = 'internet', # Name of the database on the sql server
port = 5432,
uid = Sys.getenv('MY_UID'),
pwd = Sys.getenv('MY_PWD'),
encoding = 'utf8')
::dbAppendTable(
DBIconn = con,
name = "speed_test", # Table name on the sql server
value = results
)
::dbDisconnect(conn = con)
odbc
# Cleaning log files
unlink(x = "*.log", force = TRUE)
Now we can schedule a cron
job to run this script on regular intervals, I’m going to set it to run once every hour running this commands on a system terminal:
✏️ This can also be done from R using the
cronR
package.
env EDITOR=nano crontab -e
# Add this line
0 * * * * /usr/local/lib/R/bin/Rscript '/home/pi/speedtest_job.R' >> '/dev/null' 2>&1 # Change the file path as needed
sudo service cron reload
After letting some time to pass for registers to accumulate, we can fetch data from the server with a SQL query:
library(odbc)
<- dbConnect(drv = odbc::odbc(),
con driver = 'PostgreSQL ANSI',
server = Sys.getenv('MY_REMOTE'),
database = 'internet',
port = 5432,
uid = Sys.getenv('MY_UID'),
pwd = Sys.getenv('MY_PWD'),
encoding = 'utf8')
<- "
query SELECT *
FROM public.speed_test
ORDER BY time
"
<- dbGetQuery(
raw_data conn = con,
statement = query
)
dbDisconnect(con)
Visualizing the Data
Now that we have some data to work with we can start making some plots. Since this is not an article about ggplopt2
I’m not going to get much into details about this part, I’m just going to show you some interesting plots you can get about your internet connection speed.
In Figure 1 we can see how observed download speeds distribute, letting aside the fact that the advertised speed is already slow (for developed countries standards), this reveals a pattern that is very common, most of the time you get download speeds that are much slower than advertised by the ISP and even so slow that defaults the terms of your contract, like in my case (I actually used this image to file a complaint).
library(tidyverse)
library(xkcd)
library(lubridate)
library(tibbletime)
library(scales)
# These are my personal theme settings you can ignore them if you prefer
theme_set(
theme_gray() +
theme_xkcd() +
theme(legend.position = "right",
plot.title.position = "plot",
axis.title.x = element_text(margin = margin(t = 10)),
axis.title.y = element_text(margin = margin(r = 10)),
axis.text.x = element_text(angle = 20, hjust = 1, vjust = 1),
plot.margin = margin(10, 10, 10, 10),
text = element_text(family = "Cloud Calligraphy"))
)
<- c("Mode" = "#008B00", "Parameters" = "orange")
color
%>%
raw_data ggplot(aes(x = download)) +
geom_histogram(binwidth = 0.1, fill = "#00B2EE") +
geom_vline(aes(xintercept = 20, color = "Parameters"), linetype = "dashed") +
geom_vline(aes(xintercept = 8, color = "Parameters"), linetype = "dashed") +
stat_bin(geom = "vline",
aes(xintercept = stat(ifelse(count == max(count), x, NA)),
color = "Mode"),
binwidth = 0.1) +
annotate("text",
x = c(8.4, 20.4),
y = c(8, 8),
label = c("Guaranteed Minimum Speed", "Advertised Speed"),
family = "Cloud Calligraphy",
size = 5,
angle = 90) +
stat_bin(geom = "label",
aes(label = stat(ifelse(count == max(count), round(x, 1), NA))),
binwidth = 0.1,
family = "xkcd",
color = "#008B00",
vjust = -0.2) +
labs(title = 'HISTOGRAM OF OBSERVED DOWNLOAD SPEEDS',
subtitle ='ISP: MOVISTAR 20 Mbps',
x = 'Speed (Mbps)',
y = 'Frequency',
colour = '') +
scale_x_continuous(breaks = seq(0, 22, by = 2),
limits = c(0, 23)) +
scale_colour_manual(values = color) +
coord_cartesian(clip = 'off') +
NULL
With figures 2 and 3, we can show how often the download speeds fall below the minimum guaranteed speed.
<- c("Mean Speed" = "blue",
colors "Parameters" = "orange",
"Fault" = "red",
"Download Speed" = "#00B2EE")
<- raw_data %>%
plot_data as_tbl_time(index = time) %>%
collapse_by('1 hour', side = 'start', clean = TRUE)
<- plot_data %>%
mean_speed pull(download) %>%
mean() %>%
round(1)
%>%
plot_data ggplot(aes(x = time, y = download)) +
geom_line(aes(color = "Download Speed")) +
geom_point(data = plot_data %>% filter(download < 8),
aes(color = "Fault")) +
geom_point(data = plot_data %>% filter(download < 8),
shape = 1,
color = "red",
size = 5) +
geom_hline(aes(yintercept = 20, color = "Parameters"), linetype = "dashed") +
geom_hline(aes(yintercept = 8, color = "Parameters"), linetype = "dashed") +
geom_hline(aes(yintercept = mean_speed, color = "Mean Speed")) +
annotate("text",
x = as.POSIXct(c("2019-06-24 11:00:00 UTC", "2019-06-24 11:00:00 UTC")),
y = c(7, 19),
label = c("Guaranteed Minimum Speed", "Advertised Speed"),
family = "Cloud Calligraphy",
size=5) +
geom_label(x = as.POSIXct("2019-08-01 11:00:00 UTC"),
y = mean_speed,
label = mean_speed,
family = "xkcd",
show.legend = FALSE,
inherit.aes = FALSE,
color = "blue") +
labs(title = 'OBSERVED DOWNLOAD SPEEDS',
subtitle ='ISP: MOVISTAR 20 Mbps',
x = 'Date',
y = 'Speed (Mbps)',
color = 'Legend:') +
scale_x_datetime(date_breaks = "5 days",
labels = label_date_short(),
expand = expansion(c(0, 0.04))) +
scale_y_continuous(breaks = seq(0, 22, by = 2), limits = c(0, 23)) +
scale_colour_manual(values = colors) +
coord_cartesian(clip = 'off') +
NULL
<- plot_data %>%
faults mutate(event_type = ifelse(download <= 8, 'fault', 'normal')) %>%
filter(event_type == 'fault') %>%
mutate(tbf = as.numeric(as.period(interval(lag(time), time),
unit = 'seconds')) / (3600)) %>%
tail(-1)
%>%
faults ggplot(aes(x = '', y = tbf)) +
geom_boxplot(fill = '#FF303094') +
coord_flip() +
geom_label(y = median(faults$tbf),
label = paste(round(median(faults$tbf),1), 'h'),
family = "xkcd",
show.legend = FALSE,
color = "blue") +
labs(title = 'BOXPLOT OF TIME BETWEEN FAILURES',
subtitle ='ISP: MOVISTAR 20 Mbps',
x = '',
y = 'Hours') +
NULL
And with Figure 4 we can find out when the peak hours occur, so we can know at what time of the day is more likely for us to experience slow internet speeds.
<- plot_data %>%
hour_data as_data_frame() %>%
mutate(time = format(time, "%H:%M"))
%>%
hour_data ggplot(aes(x = time, y = download)) +
geom_boxplot(fill = "#00B2EE") +
geom_point(data = hour_data %>% filter(download < 8),
shape = 1,
color = "red",
size = 5) +
geom_hline(aes(yintercept = 20), color = "orange", linetype = "dashed") +
geom_hline(aes(yintercept = 8), color = "orange", linetype = "dashed") +
annotate("text",
x = c("04:00", "04:00"),
y = c(7, 19),
label = c("Guaranteed Minimum Speed", "Advertised Speed"),
family = "Cloud Calligraphy",
size = 5) +
labs(title = 'BOXPLOT OF OBSERVED DOWNLOAD SPEEDS BY HOUR OF THE DAY',
subtitle ='ISP: MOVISTAR 20 Mbps',
x = 'Hour',
y = 'Speed (Mbps)') +
scale_y_continuous(breaks = seq(0, 22, by = 2),
limits = c(0, 22)) +
coord_cartesian(clip = 'off') +
NULL
Once you have data, plotting possibilities are only limited by your imagination, so I’m going to stop here, I hope you have enjoyed reading this article and you are motivated now to start monitoring your own internet speed. Have fun! see you soon!.