What’s Wrong With the Internet?

How many times have you received that call or even made the statement that “The Internet is Down?” Or perhaps the “Internet is Slow?” Obviously these statements are very rarely true. As a whole, the Internet is functional and it is FAST. However these statements seem true from the perspective of the individual making them. My frustration is that we never have visibility into the data necessary to assess the health of the Internet from a relevant, holistic perspective over time. As a result, consumers and providers have a limited view of problems that randomly present in this manner.

The Problem

When I think about the impact Internet hiccups have on me, I realize that I could do things much differently if it delivered consistent reliability. Even if it wasn’t as reliable as infrastructures like the PSTN, having some semblance of trust in knowing when and how my connections might fail or degrade would help. The resulting improvements would allow me to use more robust tools like video and voice over the Internet and put my cell phone away. I can’t tell you how many times I’ve spent hours chasing ghosts. These transient issues tend to get resolved when they worsens and the root cause is more easily identifiable. Increasing the trust we have in our services would materially change the way in which we use them.

I know I need to stop the rant and solve the problem. Unfortunately, the problem is complex and the solution is quite involved. So what I thought I would do here is to outline a solution that I’d like to see and solicit feedback from others. Maybe something exists that I’m unaware of. Alternatively, maybe there’s some tools that can be ‘glued’ together in a way that allows us to achieve the objective that I think we need to achieve.

Framework for Internet Health Statistics

What I would like to see is a framework established for obtaining analytics about Internet health. The first step in making things better is gathering some useful metrics that are actionable and can be shared with proper anonymization/obfuscation of the data. The Internet is a global thing and we need to look at it globally. So there is a portion of the solution that requires a real commitment of resources.

Some of the thoughts I have had are as follows:

  • High Level Architecture should be client (and/or agent), server, and reporting
  • Client agent
    • Initiating Active Probes toward server(s)
    • Scheduled/Random/Background/on-demand
    • Cross Platform–portable to any typical endpoint operating systems
    • Modular component that could be added to networking gear
    • Optional Passive monitoring for periodic upstream reporting to server (latency measurements from 3-way handshakes, reliability assessments by looking at TCP retransmits)
  • Server Component
    • Distributed Geographically across ASN’s
    • Backward Compatibility with other existing probes
    • Gathers and Logs information relevant to the measurements being taken
  • Telemetry (directionally independent when possible)
    • Loss/Jitter (gathered with some connection oriented or predictably random small packets)
    • BW Available
    • Path MTU/MSS and behavior when exceeded
    • Optional – Other In-Path Metrics
      • Port/Protocols Blocked
      • Proxies Inserted
      • L3 Hops
      • Etc
  • Reporting
    • Establish health over time and at points in time
    • Define health by the required use case(s)
    • Establish visibility into hotspots (ASNxx <> ASNyy, Geography)
    • Natural RBAC
      • User of x device can see stats of x device regardless of where he or she roams
      • ISP/Home Owner/Hotel can see health stats of downstream clients
      • Comparisons beyond normal visibility should remove any information that would prevent individuals and entities from participating in such a solution

Conclusion

I think the Internet is a wonderful tool and is the plumbing for our world today. However, I think it could be so much more. I know I’m not the only person that sees these issues around brownouts and the challenges it creates. So my question to PacketU readers is how could we come together to solve these issues in a way that benefits everyone?

I’d love to hear from you, so share your thoughts by commenting below.

Disclaimer: This article includes the independent thoughts, opinions, commentary or technical detail of Paul Stewart. This may or may does not reflect the position of past, present or future employers.

About Paul Stewart, CCIE 26009 (Security)

Paul is a Network and Security Engineer, Trainer and Blogger who enjoys understanding how things really work. With over 15 years of experience in the technology industry, Paul has helped many organizations build, maintain and secure their networks and systems.
This entry was posted in Other. Bookmark the permalink.

4 Responses to What’s Wrong With the Internet?

  1. Clay Maney says:

    I like the idea and definitely would love to be able to look to it for answers for those transient problems, but I don’t see how it could be done. Anonymization of the source information would make a lot of it useless, and I doubt many ISPs would welcome giving real metrics to their clients, much less their competitors. (Plus, it would get more complicated with the advent of prioritized traffic and different service tiers.) That doesn’t even include the various nefarious ways that less scrupulous actors could use the information… after all, if you can easily identify a choke point, you can use that to cause significant damage.

    With that said, I’ve had some success using the various Looking Glass sites out there for information about large scale routing issues, but getting anyone to *act* on that kind of information is largely impossible.

    • I definitely don’t think it is an easy problem. I don’t think we would have to have service provider participation if we could get some general participation at the customer level and some touch points in the cloud. Obviously tracking performance metrics back to providers and locations would be possible based on the IP address that is being reported and enhanced by coordinates that could be shared by the clients. I think the trick is to give parties the flexibility in what they share and yet know what has been anonymized and/or normalized (some people may not want to share their IP address but would be ok with sharing something like TWC user in City/State). Obviously any system that harvest such information would know the IP address, but that could be kept private (but that obviously assumes a responsible entity owns and manages at least the first layer of statistic gathering service).

      Information could be gleaned actively or passively from clients, servers, service providers, etc. I think there is a balance that has to be created from both an informational gathering standpoint and a privacy standpoint. It’d just be really nice to have some data analytics about the quality of connections for a given geography, provider, etc. I have more questions than answers, but I think some system could be built that would allow optional and flexible participation by nearly anything connected.

      As you stated, SP’s might not participate (and maybe they shouldn’t for several reasons). But maybe I want to participate by sharing statistical information. And maybe I can see verbose stats from anywhere my devices have been and on whatever networks. Maybe I want to upstream a subset of that data to a larger community. We need that so we realize when and where our networks really suck and can more accurately pinpoint the offending party. Solid and robust connectivity down to each and every device connected to the net allows us the confidence to do new and exciting things. The first step to that end is getting some metrics to work with.

      Good points. Take care.

  2. Dave Cardwell says:

    Take a look at RIPE Atlas, rather than a client agent it uses dedicated probes but it matches a lot of your other criteria and already has widespread deployment:
    https://atlas.ripe.net/

    There was an interesting IPJ article recently which covers the background and architecture (and gives the reasons for dedicated probes over client agents):
    http://ipj.dreamhosters.com/wp-content/uploads/2015/10/ipj18.3.pdf

Comments are closed.