Under the hood of the internet browsing

Google Homepage Circa 2001

Lets get under the hood

In this day and age, most people, or anyone reading this, is somewhat familiar with a web browser, an application that allows a user to “surf” the web. But, what is really going on behind it all? It it just magic? I’m sure for many it seems to be at first, but once broken down, the magic starts to dissipate.

The internet is quite literally, at its fundamentals, a bunch of cables connected together. Literally. A bunch of cables. Now, imagine cables that span around the entire globe, underground and underwater. The entire world is connected by this mass mess of cables in which quite literally, millions and millions of computers are hooked up to it. This is the foundation of the internet. The beautiful thing, is that we have designed ways for these computers to communicate with each other over this network. They can communicate, send data back and forth, process requests…and the list goes on. All of these individual computers, have an individual address, just as your home would. These are called IP addresses, and they use them to identify each other over the internet. Some of these addresses are other users like yourself, others are companies and workplaces, public schools and some, are web servers. When you look up a website, it is usually hosted on a web server in order for you, the client, to request for it to be loaded onto your computer screen. Web servers are computers specifically designed to host websites and process requests from clients. From a personal computer, to a web server located who knows where, it might seem daunting as to how this information gets sent over but fear not, it’s more simple than it seems. As soon as you press Enter to search for a website, a series of requests and protocols execute in succession in order to display to you the user, that which is the home page of said website. All these work in tandem in order to create the experience we know as web browsing, so lets dive in.

DNS

DNS is the backbone of the internet. DNS stands for Domain Name System, and it is a way of converting the human readable URL we have written into our web browser, into it’s original IP Address. IP stands for Internet Protocol, every computer is assigned an IP Address when they connect to the internet, as a way to identify them over the network. Most personal users use a dynamic IP assignment (your internet service provider assigns your IP address depending on availability), but for large companies like Google, they maintain static IP addresses which are the same each time you access them. So DNS is essentially responsible for the translation between a URL and its IP address. This is necessary because it would be quite impossible for humans to remember IP addresses, so it’s easier to use words.

Without DNS, the internet would crash, really. So DNS lookups will happen in the following manner:

  • Your Internet Browser will check its memory cache to see if it knows the IP address requested.
  • The browser will ask the OS to check its memory cache for the IP address.

If none of these have it:

  • The operating system is configured to query the Resolving Name Server (RNS) (usually provided by your ISP or included in your OS)
  • The RNS will check its cache. If the RNS does not have it, it will ask the Root Name Servers. (All resolvers must know where to locate the root server)
  • The Root Name Server will redirect the resolver to the corresponding Top-level domain server (TLD). (In this case, the .com TLD) Also cached.
  • The .com TLD will provide the Authoritative Name Servers (ANS) for the domain google.com. Also cached.

Thanks to the Domain Registrar, when a domain is purchased, the registrar reserves the name and communicates to the TLD registry the authoritative name servers.

  • The ANS provides the resolver with an IP address.
  • The resolver returns the IP address to the OS, it is cached, the OS provides the browser with the IP address.

This all happens in less than a second.

TCP/IP

Now that we have the IP address of google.com, the next step incurs the use of TCP/IP. TCP stands for Transmission Control Protocol and IP stands for Internet Protocol. TCP/IP is a communications standard that allows applications and computing devices to exchange data over a network. It is designed to send packets across the internet and ensure end-to-end data delivery. TCP organizes data so it can be transmitted between a server and a client. Before transmitting, it establishes a connection between a source and its destination, which it makes sure stays live until communication begins. It breaks up large amounts of data into smaller packets, and ensures data integrity throughout the process. It is one of the most common and is used by other high-level protocols due to this such as FTP, SSH, IMAP, and HTTP. IP obtains and defines the destination or source, and TCP transports and routes data through the network architecture.

Security

Once a connection is established, web servers and personal computers alike tend to have security measures in place in order to filter unwanted traffic or malicious connections. We are more commonly acquainted with firewalls. Firewalls do exactly that, they block malicious, unrecognized or unknown, and unwanted traffic from accessing our computers through the network. There are many vulnerabilities in these automated systems and protocols that can be exploited, and firewalls help block out those unwanted connections.

HTTPS/SSL

HTTPS is a second layer of security added to HTTP. HTTP stands for Hypertext Transfer Protocol, and HTTPS simple adds “secure” to it. HTTP is the protocol over which data is sent between your browser and the website that you are connected to. This means all communications between your browser and the website are encrypted. HTTPS is used to protect highly confidential information. This is done by manner of either Secure Sockets Layer protocol or Transport Layer Security protocol. Basically, both computers agree in an SSL handshake to use a code to encrypt data on both ends. The server will send the public key and maintain it’s private key. Then both sides encrypt data using the keys, and the other side can then decrypt the data using the opposite key.

www.google.com

Now that we have the IP address to google.com and are feeling safe, we are ready to access the website. Most large websites will use what is known as a load balancer to manage traffic to their web servers in order to prevents traffic loads or keep a website running in case on of the servers fails. Load balancers manage traffic based on algorithms within the capacity of the servers provided.

Now that we’ve made our HTTP request to google.com, the web server (not the real physical server located in a real location) is responsible for serving us a nice webpage. This includes the HTML, CSS and all that nice static content. This server can also contain an application server which is used to serve dynamic content. The application server is in charge of the interaction between the user and the displayed content. The application server and the web server work in conjunction.

The web server will now send back whatever the web server ( and application server) have cooked up. Our web browser will receive it, and display it. Now we can interact with it.

The internet.

google.com Schema

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store