It's not Throughput
So lately a customer has been struggling with their ISP (a cable company) over the inconsistency of their Internet connection. The ISP has been negligent in repairing their line which was "fluttering" since January of 2020. This means the connection goes up and down and up and down and up and down and up and down. It's like a freakin' Hobby Horse (I have a different analogy but I will refrain from obscenities for a change as I want to appeal to a broader audience and not just the typical OG Brooklyn types - sorry pplz). The customer is located in an "extreme" suburb, meaning nowhere near any city. Maybe some large towns but really closer to a rural area than anything else. So really they are stuck using the cable company.
Given the nature of their business, telepresence has become extremely important in the time of COVID-19. With the cable line constantly failing, the cable company keeps pushing the tried and true line "well we tested it from our side and we show full speed". Right but the customer does not so they go and blame the customer router (right, pfSense, sure) and that was met with a bit of skepticism (by me of course). So after giving them my Networking 101 lecture (more of a lecture than anything else) and kinda left them feeling "yea the typical scripts aren't working" so they opened the floor for discussion and asked for input. Before allowing me to speak they reiterated that a speed test was performed from the modem. HMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM a SPEED TEST you say?!? Yup! The no-fail go-to bull-malarky, poppy cock, spotted dick, well-hung horse (ya? Ya? No - ok). Sure did, they brought it up, a speed test and yet from no other than our friends over at (drumroll please) "Ookla"!!!
Now a bit of background on Speed Tests on the Internet, they simply are bad (iffy at best, more later). Mostly. Ookla made a business of Bandwidth metrics. A very lucrative one actually. The problem is that the method of measurement is what they feel is correct. To start with when you load an Ookla speed test it's really not performing a Ping, whether ICMP or UDP. It's transmitting data to your browser from their server and that data is then used to collect data from your browser and transmit it to their server and then measuring the latency in the transmission of that data then calling it a Ping. While this is convenient for the unsuspecting user, it's technically inaccurate and a false measurement.
Now I KNOW a shitload of naysayers are gonna come after me, especially the devotees of Ookla metrics (especially Ookla) but I did a little research (just a couple of hours because I didn't want to sound like a complete ass, just a small one) and found some really interesting info. See Ookla "still" uses Ajax and web proxies. Back in the day we would call them cache proxies. Their purpose? To cache data transmitting to/from server/client in order to speed up response times. This was especially helpful during the HTML 2.0/3.0 days when cable companies wanted to show off their nice fancy speeds in comparison to DSL providers (actually it's because of cache proxies that companies like Akamai was spawned and grew to massive proportions). The problem was, real-time and streaming was just starting to launch and these services were completely dependent on non-cached real-time protocols. It didn't matter to Ookla as they were building their metrics empire based on inaccurate data and strong marketing. I commend them for doing what they've done but it's really time for them to catch up with the 2nd decade of the 21st century.
So back to my ramble. As many HTML5 systems may use XHR (that's Ajax or XMLHTTPRequest) newer systems have started migrating their code over to websockets for a few reasons. A few less known speed test sites do use websockets and they are pretty close to being dead-on balls (ya like that huh?) accurate. I've read a bunch of benchmark reports and even performed my own sampling of XHR vs WS over and over. I've even found a site called SockShootOut (pretty cool actually, click on it) and it better demonstrates the speed comparison between the two. Of course in order to make this determination I used a combination of the source code of all speed test web pages, WireShark, Netstat and Chrome developer tools. I like Ajax. I use it a lot. I'm currently using it on a side project but I would not use it for real-time applications because it has an inherent overhead. Additionally when you're throwing a bunch of data into a stream and need to measure the transmission time the last thing you want is to add additional overhead.
So far I've spelled out the problem with the technology used to perform these popular speed tests but the bigger issue I feel is that the Ajax method uses block-data transmissions. Okay, what? You go to a speed test site that uses Ajax, start your test. It begins by transmitting some information and measuring the response time. That is what it calls the Ping time, which is not at all a ping. Then the server (or web proxy) begins to transmit blocks of data. Each block being of a specific size for measurement, but not a consistent stream of data like iPerf (later). In some cases the server data is stored on a proxy (possibly with your ISP if they have an arrangement with this speed test site/company) to make it appear even faster. As far as the upload speed is concerned, the same method is used however as I've never seen the server code I cannot say for sure if the data/bytes received by the server are actually counted or not. If they are not it may just be looking at the assembled packets and counting them instead of the actual data which again would skew the results. But I cannot assume what they are doing, or can I?
The sites I found using websockets maintain a steady stream of data in both directions, one way at a time. No overhead, no headers, no buffers, just data sent, bytes counted and the transmission is timed both ways. Incorporating a proxy with websockets for real-tme data transmission is itself difficult. But one more thing I noticed, some of the sites using sockets allow you to send a file of your choice and guess what? It sends it right back to you! What does this mean? It means you can verify that the data being sent is real data. It's like saying "hello dumbass" into a telephone and you hear your own voice saying the same thing back, verifying a full round-trip from your computer to their server and then back again. This is certainly the type of speed test I would consider more accurate than an Ajax-based test.
Now certainly an Internet-based speed test cannot really be accurate. First you sign up for this ultra high speed Internet service because their advertising was all fancy and glittery and next thing you know your speeds are toilet level crap. You contact tech support and they have you go to a web-based Internet speed test website to prove that your connection is truly at the speed it should be. While they monitor your attempts of checking your speed are they doing something on the back end to ensure you get a clear path to the speed test site? Maybe but I have no way to verify that for you. Are you getting the speed they say they are giving you? Well if their benchmark is a PUBLIC speed test website, then they already lost. See the Internet and all networking systems are developed and designed to interoperate. There is a working group called the Internet Engineering Task Force (oh yes an Internet Task Force of Engineers, with elite jackets and even helicopters) and they are the geniuses who come up with standards and methodologies for making the Internet happen. "Because I can tell you as an ISP that I am giving you what I say I am giving you", can they prove it? Sure a public speed test. But does that meet the IETF standard? F&^K NO! See the IETF has specifically designated this as bull! They have a document entitled RFC 2544 which at the very beginning in the introduction states "Vendors often engage in "specsmanship" in an attempt to give their products a better position in the marketplace. This often involves `smoke & mirrors` to confuse the potential users of the products". ARE YOU KIDDING ME? You gotta love that! Section 14 and 15 of the RFC already disqualify the Ajax method of testing because the size of the data blocks vary and are not consistent as well they are not using a single path stream but rather a request/response method. Additionally Ajax does not offer any facility to change the frame size of the TCP window (not that websockets does but it certainly meets part 14 & 15).
Now since DOCSIS (the technical method that cable systems use for your Internet) is currently on version 3.1, which DOCSIS uses OFDM or Orthogonal Frequency Division Multiplexing which allows a signal to actually have multiple signals embedded within it (it really gets technical and most of you could care less about this but if you really want to know more, click here to Google OFDM) but while this is cool and all, usually the number of downstreams (download) are greater than the upstreams (upload) so that your download is faster from the Internet than your upload (typical for cable systems). It uses a combination of packet encapsulation and data compression algorithms in order to get the job done. Currently the fastest cable modems available have 32 downstreams and 8 upstreams (but that does not mean your cable provider supports this technology). Consider these "channels" where the modem bonds the channels together to give you faster speeds. The previous models had 16 downstream channels and 4 upstream channels. In either case, the latest is 32x8 (but in reality, you're probably looking at 24x8) and the previous was 16x4, which is how the engineers refer to them. On each level of technology, the size of the channels are different as well. Earlier models of DOCSIS devices had a smaller bandwidth per channel than newer models. No I don't mean the bandwidth of data (sort of), I mean the RF transmission (No I won't explain this now, wait for another ramble). Also I promised myself I will not overly complicate this site with tables or any DHTML crap so maybe if I feel like it I will make a jpg and put it up, but not now. But the cable systems themselves are not all compliant on the newest DOCSIS stuff so then your firmware on your modem must be compliant to the cable company's systems. Yea baby it gets far worse.
So an RFC2544 test could in theory happen on a cable modem however with the inconsistencies of their own network, the typical over-subscribing of their service lines and depreciated state of their physical lines on the pole that only get fixed when something really goes bad, nopes. Ain't happening. A proper testing system for RFC 2544 costs a few tens of thousands of mulas and really every tech should have a portable unit and be able to test on demand with their central engineering facility, which they do not. Since the cost of maintenance and upgrades are so high, how are they offering such low prices? You guessed it! They sell what they don't have or they sell what they do have BUT!!!! BUT!!!! But it ain't done right, or - excuse me, properly. Yea I type the way I talk I know. Have some :-p
So then what testing method is available to really test my throughput? Well you could find an ultra fast FTP site that will allow you bi-directional transmissions to upload and download a file or three. Or learn how to use a tool like iPerf, which is very, very useful but a damn nightmare to use. Still iPerf is a defacto standard in throughput testing and pretty much every network engineer knows of it. Many are annoyed by it but we cannot deny it's effectiveness. So while most Internet speed tests will measure your "bandwidth", that is pretty much just verifying your maximum potential speed. What you really want to know is your throughput, which is your real-world speed available and that can only be performed with a steady stream of data (hence the RFC 2544). Both bandwidth and throughput can be affected by a bunch of variables. But to cut to the chase, I have found that MLabs makes a Network Diagnostic Toolkit (they call it the NDT) which actually uses websockets and boasts a vast community in their open platform. I've tested it extensively and found that it's pretty good which surprised me. While I do like having granular control like iPerf provides, as I said you really need to know what you're doing with it in order to get true results. It's not just telling iPerf the server location and you go. You need to first measure your Jitter (delta latency factor) and push various TCP window sizes and various payloads in order to know the right payload size based on latency and Jitter, then you can perform your actual speed test with the correct parameters to determine your bandwidth (not throughput). MLabs seems to have figured out a simple way to get this done but remember, it's still an Internet speed test.
So why are Internet speed tests iffy at best? Well I spelled out how Ookla uses their inaccurate measurement system using HTTP pushes and pulls (XMLHTTPRequest or XHR or Ajax or let's just say, simply not accurate) to send and receive data using a preformatted protocol which in itself can be bloated and prone to its own latency (server latency, server thread latency, non-priority HTTP Class of Service causing providers to push your packets down the priority queue) as well as the ever-so-inaccurate webproxy/cache-proxy false results (by caching the data on another server possibly on your ISP's network in order to provide faster speeds yet still not real-time to the actual server). I also pointed out that while websockets offers a real-time packet transfer using true sockets (well sort of, websockets initiates with HTTP but keeps the socket open after the HTTP headers in order to transport more raw data using a persistent TCP connection), it's still not exactly perfect. Since it is not UDP or ICMP it really cannot measure the latency in the route between your computer and the server, but it can measure the latency in the actual transmission between the two endpoints which is far better than Ajax. In fact, WebRTC uses websockets for real-time streaming. Google uses WebRTC for Hangouts and Meet, Facebook for Messenger, Discord for chat, Amazon uses it internally for Chime, HouseParty uses it, GotoMeeting, WebEx and even Zoom. I am pretty sure Youtube does as well but currently I am not going to tear down the protocol in WireShark just to say so here with any certainty. As I said, websockets start off using HTTP but after that it requests a protocol change/upgrade to sockets which is then passed to the appropriate application which handles the protocol request.
I know I still haven't answered the question. Internet speed tests are iffy because they are not taking into consideration a number of factors. First you have your home or office. Are you on wifi or wired to your router? There is some latency there plus a hop (a huh wuh?). A network is made up of 2 or more devices connected. If you directly connect your laptop to another laptop, that is a single-hop connection. But let's say you'r laptop is on wifi (device A). The wifi comes from your wireless router. You want to connect to another laptop connected to the same wifi (device Z). So you are going from A to Z. But you have your wireless router (device B) in between them. So you technically are not going from A directly to Z, instead you go from A to B then to Z. This is now 2 hops to get to laptop Z from laptop A. If there is latency on your wifi, you need to take that factor into consideration. If your wireless router is lagged (slowing down for whatever reason), that causes latency. Higher-end models of wireless routers (not more expensive, higher-end) may have a separate switching mechanism for their wired and wireless connectivity which allows more transparent communications between locally connected devices where the latency will then only occur when reaching out over the Internet (or "other" connected networks). Now you have a cable modem. While these should typically act as a bridge between your home/office network and the Internet, they are usually and unfortunately designed to do a bit more which can cause latency.
Then you have the actual cable lines outside. They are coax. Make no mistake, the cable company may advertise all fiber, but to deliver to your home they use copper coax cable (unless they've upgraded their network to FTTH/FTTP but usually it's just FTTC/FTTN) you will be facing a few issues. Coax cannot provide the type of OFDM bandwidth that fiber can given the limitations of the medium but Cable Labs is really pushing its limits (currently at 8x4 but in some scenarios they can push 24x12 for 1G, however they are endeavoring for DOCSIS 3.1 to support 200x60 for 10Gbps speeds). Fiber on the other hand can go beyond 10G speeds if properly implemented. Now considering that in most cases cable systems are FTTN that means that the node can be located a mile away from you. This means there may be a number of "repeaters" in the coax link which "condition" the line/signal in order to extend the signal further than it's physically capable of reaching. These repeaters are not without their faults as well especially when the seals on the connectors which are directly exposed to the elements of the outdoors start to degrade which can happen under heat/cold/rain/frost/sunlight conditions. Now when the signal finally reaches the node, it's typically converted to fiber which then may go directly to a facility that hands off the connection to the core routing fabric of the cable company OR before it even reaches that point it may go through a number of node repeaters (for simplicity of course) before reaching this facility. Now once it reaches THAT facility it's then transferred to yet ANOTHER facility. Depending on the cable company it may even go through yet ANOTHER. Remember that each point of interconnection whether or not it handles any level of routing adds latency.
Now when the signal actually gets routed, that will certainly add latency because the packet needs to be logically transferred to another system and that system also has its' own inherent lags. Commercial-grade routing systems are designed to be efficient and low-latency so the transfer may only add a very small amount of delay. So when you connect to Google for example, chances are that your ISP is not directly connecting with Google (known as peering). They are actually transferring your request to a connection partner in order to get your packet on the Internet. This transfer to the peering partner is not without consequences. Yup, latency.
So you wanna see proof? A common tool in nearly every computer operating system with integrated networking (most modern operating systems) is called a trace route. It uses ICMP (some use UDP) to map out the latency from A to Z over a network or even the Internet. In Windows, it's "tracert". In Linux/OSX it's "traceroute". No I'm not going to spell out every Internet-connected OS (Cisco IOS, Juniper JUNOS, BSD, etc) so forget it. So let's say you open a Command Prompt, Powershell (Windows) or Terminal window (Linux/OSX) and type in "tracert www.google.com" or "traceroute www.google.com" (in the respective order of the OS' mentioned) and you will see something fascinating (unless you're using Windows with FIOS then you will only see one-hop, more on that later). Strangely enough you hop from your router to your cablemodem (sometimes it shows nothing on line 2) then a series of hops (about 3 or 4 of them) within your ISP's network, then connecting to their peering partner then to THEIR peering partners and eventually to the Z network. First, why does the 2nd hop (sometimes 3rd) show that the request as timed out? Because your cable modem is not a true bridge (I mentioned this earlier). Linux/OSX has a cute little trick as well that permits you to switch between ICMP, UDP and TCP trace routes. No matter which protocol used, there will always be that timeout.
So in the end what you think is a direct connection to ANY speed test out there on the Internet is simply a measurement of your connection to their server, regardless if it's located in the same city as you are, it still has a bunch of interconnections to go through in order to go from A to Z. Now for every ISP that wants to argue, let's see a technical diagram of how you route your packets. No seriously. I've designed a number of systems. Not just SMB or Enterprise systems, but ISP, data center and carriers. I've worked with some of the biggest and even the smallest. I've learned a lot from many people and the one thing I've learned is there are a lot of smoke and mirrors. I hate to bring the beast into this discussion but even Verizon is guilty to some extent with their FIOS product. They have decided to block customers from performing their own ICMP trace routes and instead return the results themselves. Yup, they are interfering with standard network protocols to hide how many hops it will take to get from A to Z. But they cannot do this with TCP and UDP traces. So in order to hide whatever secrets they are trying to hide, it's silly to think they can really block all tracing diagnostics.
So yea this has been a really long-winded rambling (hence why I call them ramblings) and to be honest it's nice to get it out since I feel like I am repeating myself all the time about this. Testing your speed on the Internet really cannot be concise. The use of public speed testing sites, you have to know how they work to know whether or not they can be trusted. The goal here was not to knock companies using Ajax that heavily push their metrics data (still friends Ookla?), but to enlighten those who still bang this crap around in their heads saying "Ok so the speed test shows I'm getting my 500 Megs but I can't even stream this dang Youtube video, but the cable company insists that I am getting my speed. They even sent a tech out to show me things work. They even charged me $50 for a service call! But why in the Sam Hill can't I still not even get my Gmail?". No I would not recommend Geek Squad to fix your stuff. They have an internal training program of who knows what they teach specific to their tech. I know there are a lot of good techs out there that freelance and even some companies employ talented techs but many times these techs are pushed aside for the sake of profit (as the cable companies may do in order to just make that almighty dollar). Yes money is important but shame on me for thinking that being righteous and helpful is better than making a buck.