There is a problem in serving high volume large data files on something like a media web site. The issue comes in when trying to both manage files on the backend and deliver files quickly to frontend users.
A common filesystem used to manage large numbers of files is mogilefs, yet mogilefs has no built-in security mechanisms, nor is it capable of serving high volume well. This is not a fault of mogilefs, as obviously some sort of caching should be used.
Lighttpd is a common web server used to server large numbers of static files. It’s very good at its job.
If mogilefs is the backend in our scenario, and lighttpd is the frontend, the major concern comes in getting lighttpd to communicate with mogilefs only when needed, and also getting the two to communicate efficiently.
Lighttpd supports one mechanism for FastCGI apps to tell it what local file to serve, the X-Sendfile response header. This is pretty straight forward to use: a request comes in, the FastCGI app parses the URL and pulls the relevant file out of the mogilefs cluster to a local filesystem cache, and then passes the X-Sendfile header to lighttpd, at which point lighttpd serves the file from the local filesystem cache. This all seems well and good, but when you try to do more advanced serving this solution has drawbacks.
The first drawback is how to serve a range of data to a user. This is required for video pseudostreaming: a user tries to seek to a spot in a video in their browser that has not yet been buffered, and thus a new request must be initiated beginning at that new spot in the file. Lighttpd supports pseudostreaming via the mod_flv_streaming module, but that module really has no ability to cooperate with mod_fastcgi, and in fact is intended entirely to work with only static files being served directly off of disk with no mod_fastcgi middle man.
The second drawback is how to serve a range of data to a user with an FLV file header prepending the actual file data. You can’t just send it back as the normal response, because lighttpd will overwrite that data with the contents of the file specified by X-Sendfile.
Thus, there was no good way to connect mogilefs, a local filesystem cache, and lighttpd. The requirements for a solution were to allow a FastCGI app to fetch a file out of a mogilefs cluster if it did not exist in the local cache, and tell lighttpd to serve the file, optionally from a specific point in the file with all required file headers in place.
I was able to get a nearly ideal solution in place, but it involved mod_magnet which is slow (processed in lighttpd core.) Also, since the lua memcache module can’t be used within a mod_magnet script because of naming conflicts between the lua socket module and lighttpd’s internal lua entities, there is no way to securely generate a temporary name for files in the local system cache. To work around this, I had to generate file names based on time and the incoming file key from the request which is certainly not ideal and also does not allow the cache to be expired or cleaned gracefully.
The obvious solution to me was to patch lighttpd. Initially, I thought that I’d only have to serve a portion of a file via X-Sendfile to allow for pseudostreaming. So, I submitted a patch to allow for an X-Sendfile-Range header that simply specified a range of bytes to serve from the file. This patch was applied for the 1.4.23 release, but then removed because…
I forgot though that this solution had a major drawback (and it was less than ideal for stbuehler, the current main developer of lighttpd.) One would actually need to have lighttpd serve two files consecutively, in order. The first file contains the FLV header sent when pseudostreaming, and the second file would be the actual video file (whose range to send would also need to be specified to lighttpd.)
With some help from the lighttpd developers, we came to the conclusion that the X-Sendfile-Extended header should be implemented, allowing a FastCGI app to specify the response header one or more times in order to have lighttpd serve the respective files off of the local filesystem. I have recently submitted a patch for this feature as well. The only things remaining on it are for the feature to support the standard HTTP range syntax, and also to have lighttpd interpret comma delimited header values (e.g. X-Sendfile-Extended: 15-5000 /tmp/myfile, 777-2239 /tmp/myotherfile) to conform with the HTTP specification.
Although a quite fast and capable system is in place using a simple python flup script and lighttpd’s mod_magnet, implementing this mechanism once lighttpd stable is patched should drastically improve performance.