The first step in being able to process inbound FTP files with the FileSystemWatcher class is, of course, to have at least one Windows-based FTP server, on which we can install a service that leverages a FileSystemWatcher.
This is a multi-step process, which I’ll be describing in a series of blog posts:
- We need to create at least one Windows server. In this demo, I am going to create two Azure virtual machines (each running Windows Server 2012 R2 Datacenter) in an availability set, to ensure 99.95 percent uptime. I’m going to build these with an Azure Resource Manager, or ARM, template.
- IIS needs to be installed, along with FTP services. I am going to do this via RDP on each VM. I could, in theory, install and configure IIS with FTP support through a Desired State Configuration PowerShell script; but because I need to wade into the machines anyway in order to configure the FTP service, I will also manually set up FTP services so you can see how that’s done.
- I need to configure the FTP server. This includes opening the PASV FTP ports for each machine; applying IP address restrictions to inbound FTP connections; and opening the relevant FTP ports in the machines’ Windows firewall.
- I need to configure FTP user isolation. This ensures I can have multiple FTP accounts on a machine, each of which will not pollute others’ files; I can also run one or more FileSystemWatcher services to independently process these users’ files.
- I need to restart the FTP service, then test FTP connectivity.
Once all that is done, I can begin working on my FileSystemWatcher-based service.
This post will focus on the design we’ll use for the FTP virtual machines.
The reason I am doing it this way is because a) I know how to do it this way, b) pretty much every Windows sysadmin can manage FTP service in IIS, and therefore c) it is way easier to troubleshoot this methodology versus a Logic App.
That said: It is my general opinion that when presented with the opportunity to choose platform as a service over infrastructure as a service, you should almost always choose PaaS. It tends to be cheaper, stabler and easier to scale, deploy, update, amend … PaaS is just plain better in pretty much every measurable way over IaaS.
The virtual machine configuration
I am going to configure these two virtual machines so they are in a single availability set, with the same public IP address, behind an internal Load Balancer.
What that means is, by having two VMs that are identical and in the same availability set, I will be able to ensure that I meet the requirements for Azure’s 99.95 percent service level agreement.
These two machines will load balance their HTTP (port 80), HTTPS (port 443), active FTP (port 20) and FTP command (port 21) ports. That means the Azure Load Balancer that is part of the virtual network I will create for this machine, will automatically determine a specific machine to handle each request on those ports.
It may be that one virtual machine handles nearly every request; or the majority of requests; or about half the requests. It’s really up to what the Load Balancer believes is best. But if one of those VMs needs to be recycled, either due to a fault or an update, the Load Balancer will ensure the other virtual machine handles all requests until the failed VM comes back online.
Finally, each of these machines will have specific-to-it PASV ports. That ensures that once connected, a client sends a file only to one of the VMs.
Passive FTP and Azure availability sets
We can get away with having more than one virtual machine listening on Port 21 (the FTP connection port) thanks to PASV mode and the Azure Load Balancer.
This is a bit of an oversimplification, but we can have two machines simultaneously handling FTP requests because of the port restrictions inherent in PASV mode.
Let’s suppose we have two Azure virtual machines in an availability set. Both are in the same virtual network and behind a single internal Load Balancer.
VM 0 is using ports 2000-2009 for its FTP data ports; and VM 1 is using ports 3000-3009 as FTP data ports.
When the client connects to the public IP address for our network on Port 21, the Load Balancer will forward that request to either VM 0 or VM 1. Let’s suppose, for this example, the request is forwarded to VM 1.
VM 1 responds to the request by telling the client to make its data connection on, let’s say, Port 3007.
The client begins its data stream on Port 3007. When the virtual network sees this, it knows that all requests for Port 3007 should be sent directly to VM 1, and it bypasses the Load Balancer entirely.
Thus, the correct virtual machine handles the request from start to finish, because even though it was randomly selected by the Load Balancer to handle the request, VM 1 is the only machine configured to listen on Port 3007.
Clients using both active and passive FTP modes usually establish communication with an FTP server on Port 21. This is a command-only port; it’s not used for data transfer.
When you create an active connection to an FTP server, your client tells the FTP server what port it should use to actually stream data. (This is usually Port 20, by convention, for active FTP connections; but it can, in theory, be any port other than the command port.)
But this can be a problem for the FTP server. It may not be able to make outbound connections on the specified port, and it may be that the server does not have permission on the client’s network to use the port specified by the client.
In PASV mode, the server tells the client what port to use for the data stream.
This ensures the server can communicate on that port. True, it doesn’t ensure the client can communicate on that port; but as a rule, it is easier and more secure for a client to adjust its firewall rules than it is to do so on the server’s end.
That’s it for now. In the next post, I’ll go over the ARM template I’ll use to create this scheme.