It’s interesting how a trick you picked up along the way becomes your go-to way of solving problems.
One such trick for me is using the FileSystemWatcher class to inspect inbound FTP folders for new or updated files, and processing those files automatically.
Bill Wilder taught me how to use the FileSystemWatcher back in 2013 and I have probably used it a dozen times since to automate data processing jobs. It was an especially valuable technique at my previous job with MaineToday Media.
Most of MaineToday’s data partners are stuck in Y2K tech, using FTP to send XML and CSV files back and forth. While this makes processing data relatively straightforward, it makes it difficult to automate file handling, especially for incoming data files.
But thanks to the FileSystemWatcher, installed as part of a Windows service, I can detect inbound files as they arrive and process them. Applying FileSystemWatcher objects to inbound FTP folders is something I’ve done a dozen times now, and used to process Maine’s election results.
For several years, MaineToday Media has partnered with the Associated Press for gathering election results. In exchange for providing people and equipment to collect raw results, MaineToday received formatted AP results data files via an FTP push.
All election results for the entire state were contained in a single semicolon-delimited text file, which was pushed to the FTP server every three minutes.
The file contained results for a total of 5,329 records, covering 195 races. Each race contained a statewide results record, and up to 589 additional records for each municipality in Maine that reports results.
In other words: For the presidential races, there were 590 total records: One statewide results record, and one record each for every town in Maine. Same was true of the six referenda. Races for US House contained as many as 400 records. Some legislative races were only two records (a statewide result and one municipality) or 40-plus (statewide results plus dozens of small municipalities).
Here’s what a record looked like (This is statewide presidential results):
My task was:
- Detect the arrival of these files,
- Parse them into a format that can queried and traversed,
- Query for the multiple results sets I needed,
- Convert those results sets to JSON, then
- Send the relevant JSON files to Storage so they can be served to clients.
In this sub-series of articles, I’ll describe the methodologies I used to automatically process these incoming files.
The very first thing I made, when my employer first subscribed to Azure, was a job server virtual machine.
This job server became a natural place to host the inbound FTP files sent to us by our partners, including the Associated Press.
I’ve used a Standard A2 instance (2 cores, 3.5 GB memory, standard Storage disk image) for this server for several years now. It hosts about a half-dozen inbound FTP jobs per year, as well as another half-dozen scheduled tasks that reach out for data, every day.
Other than an incident in November 2014 when the VM was down for about a day due to a bad patch, this machine has been up almost continuously and requires zero daily intervention.
Best of all, configuring this virtual machine is exactly like using an on-premises machine. I simply installed Internet Information Server with FTP service enabled, and I was up and running.
The first step in preparing your Azure virtual machine to automatically process FTP files is to correctly set it up as an FTP server.
In addition to installing IIS and enabling FTP services, you’ll need to
- configure your VM’s endpoints to accept FTP connections, especially PASV connections;
- apply IP restrictions to access your FTP server, to prevent just anybody connecting to it;
- open the VM’s firewall ports for FTP access;
- configure user isolation, to prevent FTP users from going rogue; and
- add a Traffic Manager profile, so you can failover to an alternate VM if there’s trouble with the primary VM.
I’m going to go over these steps in upcoming blog posts.