Project 3: Raw Sockets
Build your own TCP/IP implementation
Description
The goal of this assignment is to familiarize you with the low-level operations of the Internet protocol stack. Thus far, you have had experience using sockets and manipulating application level protocols. However, working above the operating system’s networking stack hides all the underlying complexity of creating and managing IP and TCP headers.
Your task is to write a program called rawhttpget
that takes a URL on the command line and downloads the associated file. You may use any HTTP code that you wrote for Project 2 to aid in the process. However, your program must use a SOCK_RAW/IPPROTO_RAW socket, which means that you are responsible for building the IP and TCP headers in each packet. In essence, you will be rebuilding the operating system’s TCP/IP stack within your application.
High-level Requirements
You goal is to write a program called rawhttpget that takes one command line parameter (a URL), downloads the associated web page or file, and saves it to the current directory. The command line syntax for this program is:
$ ./rawhttpget [URL]
An example invocation of the program might look like this:
$ ./rawhttpget http://david.choffnes.com/classes/cs5700f22/2MB.log
This would create a file named 2MB.log`` in the current directory containing the downloaded file content. If the URL ends in a slash ('/') or does not include any path, then you may use the default filename
index.html. For example, the program would generate a file called
index.html` if you ran the following command:
$ ./rawhttpget http://www.khoury.northeastern.edu
The file created by your program should be exactly the same as the original file on the server. You can test whether your file and the original are identical by using wget
, curl
, or a web browser to download the file, and then comparing the file generated by your program and the original using md5sum
or diff
. Do not include extra info in your generated file, like HTTP headers or line breaks.
Since the point of this assignment is not to focus on HTTP, there are many things your program does not need to handle. Your program does not need to support HTTPS. Your program does not need to follow redirects, or handle HTTP status codes other than 200. In the case of a non-200 status code, print an error to the console and close the program. Your program does not need to follow links or otherwise parse downloaded HTML.
Low-level Requirements
The primary challenge of this assignment is that you must use raw sockets. A raw socket is a special type of socket that bypasses some (or all) of the operating system’s network stack. For example, in C a socket of type SOCK_RAW/IPPROTO_RAW bypasses the operating system’s IP and TCP/UDP layers. In your program, you will need to create two raw sockets: one for receiving packets and one for sending packets.
- The receive socket must be of type SOCK_RAW/IPPROTO_TCP.
- The send socket must be of type SOCK_RAW/IPPROTO_RAW.
The reason you need two sockets has to do with some quirks of the Linux kernel. The kernel will not deliver any packets to sockets of type SOCK_RAW/IPPROTO_RAW ($ man 7 raw
for more details), thus your code will need to bind to the IPPROTO_TCP interface to receive packets. However, since you are required to implement TCP and IP, you must send on a SOCK_RAW/IPPROTO_RAW socket.
There are many tutorials online for doing raw socket programming. I recommend Silver Moon’s tutorial as a place to get started. That tutorial is in C; Python also has native support for raw socket programming. However, not all languages support raw socket programming. Since many of you program in Java, I will allow the use of the RockSaw Library, which enables raw socket programming in Java.
When you start to write your program, you will immediately notice that the SOCK_RAW/IPPROTO_TCP receive socket is promiscuous: it receives all packets that are being sent to your machine, regardless of whether they are TCP or UDP, their destination port number, etc. One of your tasks will be filtering the incoming packets to isolate the ones that belong to your program. All other packets can be ignored by your program.
Your program must implement all features of IP packets. This includes validating the checksums of incoming packets, and setting the correct version, header length and total length, protocol identifier, and checksum in each outgoing packet. Obviously, you will also need to correctly set the source and destination IP in each outgoing packet. You may use existing OS APIs to query for the IP of the remote HTTP server (i.e., handle DNS requests) as well as the IP of the source machine. Be careful that you select the correct IP address of the local machine. Do not bind to localhost (127.0.0.1)! Furthermore, your code must be defensive, i.e., you must check the validity of IP headers from the remote server. Is the remote IP correct? Is the checksum correct? Does the protocol identifier match the contents of the encapsulated header?
Your code must implement a subset of TCP’s functionality. Your program must verify the checksums of incoming TCP packets, and generate correct checksums for outgoing packets. Your code must select a valid local port to send traffic on, perform the three-way handshake, and correctly handle connection teardown. Your code must correctly handle sequence and acknowledgement numbers. Your code may manage the advertised window as you see fit. Your code must include basic timeout functionality: if a packet is not ACKed within 1 minute, assume the packet is lost and retransmit it. Your code must be able to receive out-of-order incoming packets and put them back into the correct order before delivering them to the higher-level, HTTP handling code. Your code should identify and discard duplicate packets. Finally, your code must implement a basic congestion window: your code should start with cwnd=1, and increment the cwnd after each successful ACK, up to a fixed maximum of 1000 (e.g., cwnd must be <=1000 at all times). If your program observes a packet drop or a timeout, reset the cwnd to 1.
As with IP, your code must be defensive: check to ensure that all incoming packets have valid checksums and in-order sequence numbers. If your program does not receive any data from the remote server for three minutes, your program can assume that the connection has failed. In this case, your program can simply print an error message and close.
Testing Your Program
You may test your program against Fakebook or against this assignment page. We have also created some large files full of random bytes that you can use to stress-test your implementation: 2 MB file, 10 MB file, 50 MB file. Note that these files may only available via http within Northeastern Networks.
Developing Your Program
Access to raw sockets requires root privileges on the operating system. Recall that raw sockets are promiscuous, i.e., they can observe all packets that arrive at a machine. It would be a security vulnerability if any program could open raw sockets, because that would enable you to spy on the network traffic of all other users using a shared machine (e.g., one of the Khoury machines).
Since we cannot give you root access to the Khoury machines, you will need to develop your program on your own Linux machine or in a VM. We will be grading your code on a stock Ubuntu Linux 20.04 machine, so keep that in mind when developing your code and setting up your VM. Even if you have Ubuntu/Debian installed, native or as a VM, we strongly recommend you to use a VM, so that you do not have to mess up your working environment. Do not develop your program on Windows or OSX: the APIs for raw sockets on those systems are incompatible with Linux, and thus your code will not work when we grade it.
Once you have your VM set up, you will need to install development tools. Exactly what you need will depend on what language you want to program in. There are ample instructions online explaining how to install gcc, Java, and Python-dev onto Ubuntu.
For Windows or Intel-based Macs:
- Register a free Personal User License of VMware Fusion Player 13. Go to VMware Fusion Player; select “License & Download” tab; click “create an account” (if you do not have one); follow the instructions to create a VMware account; login your account; go to VMware Fusion Player again to get your Personal User License.
- On the same page, download and install VMware Fusion Player 13 by clicking “Manually Download” under tab “License & Download”.
- Then, download the Ubuntu 20.04.5 64-bit PC (AMD64) desktop image from http://cdimage.ubuntu.com/focal/daily-live/pending/.
For Apple Silicon (M1/M2-based) Macs:
- Register a free Personal User License of VMware Fusion Player 13. Go to VMware Fusion Player; select “License & Download” tab; click “create an account” (if you do not have one); follow the instructions to create a VMware account; login your account; go to VMware Fusion Player again to get your Personal User License.
- On the same page, download and install VMware Fusion Player 13 by clicking “Manually Download” under tab “License & Download”.
- Then, download the Ubuntu 20.04.5 64-bit ARM (ARMv8/AArch64) desktop image from http://cdimage.ubuntu.com/focal/daily-live/pending/.
Modifying IP Tables
Regardless of whether you are developing on your own copy of Linux or in a VM, you will need to make one change to iptables
in order to complete this assignment. You must set a rule in iptables
that drops outgoing TCP RST packets, using the following command:
% iptables -A OUTPUT -p tcp --tcp-flags RST RST -j DROP
To understand why you need this rule, think about how the kernel behaves when it receives unsolicited TCP packets. If your computer receives a TCP packet, and there are no open ports waiting to receive that packet, the kernel generates a TCP RST packet to let the sender know that the packet is invalid. However, in your case, your program is using a raw socket, and thus the kernel has no idea what TCP port you are using. So, the kernel will erroneously respond to packets destined for your program with TCP RSTs. You do not want the kernel to kill your remote connections, and thus you need to instruct the kernel to drop outgoing TCP RST packets. You will need to recreate this rule each time your reboot your machine/VM.
Debugging
Debugging raw socket code can be very challenging. You will need to get comfortable with Wireshark or tcpdump in order to debug your code. Both programs are a packet sniffers and can parse all of the relevant fields from TCP/IP headers. Wireshark has a GUI; tcpdump does not. Using either, you should be able to tell if you are formatting outgoing packets correctly, and if you are correctly parsing incoming packets.
Language
You can write your code in whatever language you choose, as long as your code compiles and runs on a stock copy of Ubuntu 20.04 on the command line.
Be aware that many languages do not support development using raw sockets. I am making an explicit exception for Java, allowing the use of the RockSaw library. If you wish to program in a language (other than Java) that requires third party library support for raw socket programming, ask me for permission before you start development.
As usual, do not use libraries that are not installed by default on Ubuntu 20.04 (with the exception of RockSaw). Similarly, your code must compile and run on the command line. You may use IDEs (e.g., Eclipse) during development, but do not turn in your IDE project without a Makefile. Make sure you code has no dependencies on your IDE.
Submitting Your Project
To turn-in your project, you must submit the following things:
- The thoroughly documented source code for your program.
- A
Makefile
that compiles your code. - A plain-text (no Word or PDF)
README.md
file. In this file, you should briefly describe your high-level approach, any challenges you faced, and an overview of how you tested your code. - If your code is in Java, you must include a copy of the RockSaw library.
Your README.md
, Makefile
, source code, etc. should all be placed in the root of a compressed archive (e.g., a .zip or .tar.gz) and then uploaded to Gradescope. Do not use the archive option on a Mac—Gradescope does not parse these archives. Instead, use the zip
program on the command line. Alternatively, you can check all these items into Github, download a zip file from Github, and submit that to Gradescope.
Only one group member needs to submit your project. Remember to add your teammate when submitting.
After a few minutes, the autograder should complete and show you the results. You may submit as many times as you wish; only the last submission will be graded, and the time of the last submission will determine whether your assignment is late.
Grading
This project is worth 15 points. You will receive full credit if:
- Your code compiles, runs, and correctly downloads files over HTTP.
- You have not used any illegal libraries.
- You use the correct types of raw socket.
All student code will be scanned by plagiarism detection software to ensure that students are not copying code from the Internet or each other.