Archive for April 8th, 2008
How to check if a file is compressed in C#
Problem:
.NET 2.0 introduced the GZipStream class to allow programmatic compression and decompression of files. However that class doesn’t provide an easy way for us to determine if a file is actually a compressed file. Decompressing the entire file just to determine if a file is actually compressed is wasteful.
Solution:
Most files have a sequence of bytes in the beginning of the file dedicated to holding information that can help us identify the type of the file. A list of these “magic numbers” is available online.
Files that are compressed using GZIP compression algorithm begin with “1F 8B 08” (that’s the value of the first 3 bytes from the file, written in hexadecimal), while those compressed using PK-ZIP algorithm begin with “50 4B 03 04” (again, these are the first 4 bytes written as hexadecimal).
Code examples:
Before I show you the code, here’re some examples of how it can be used.
If “testFile.gz” is a valid Gzip compressed file, isGzip variable will be set to “true”:
bool isGzip = FileChecker.CheckSignature(“testFile.gz”, 3, “1F-8B-08″);
If “testFile.zip” is a valid PK-Zip compressed file, isPKZip variable will be set to “true”:
bool isPKZip = FileChecker.CheckSignature(“testFile.zip”, 4, “50-4B-03-04″);
Caveat:
Actually a binary file might start with the same “magic number” as normal PK-Zip files do and in that case FileChecker.CheckSignature will still return true (i.e. a false positive). But all valid PK-Zip files “will” have that magic-number in the beginning. So, a more accurate way to rephrase the examples would be:
If “testFile.gz” is not a valid Gzip compressed file, isGzip variable will be set to “false”:
bool isGzip = FileChecker.CheckSignature(“testFile.gz”, 3, “1F-8B-08″);
If “testFile.zip” is not a valid PK-Zip compressed file, isPKZip variable will be set to “false”:
bool isPKZip = FileChecker.CheckSignature(“testFile.zip”, 4, “50-4B-03-04″);
Code:
Anyways, here’s the code:
public static class FileChecker
{
public static bool CheckSignature(string filepath, int signatureSize, string expectedSignature)
{
if (String.IsNullOrEmpty(filepath))
throw new ArgumentException("Must specify a filepath");
if (String.IsNullOrEmpty(expectedSignature))
throw new ArgumentException("Must specify a value for the expected file signature");
using (FileStream fs = new FileStream(filepath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
if (fs.Length < signatureSize)
return false;
byte[] signature = new byte[signatureSize];
int bytesRequired = signatureSize;
int index = 0;
while (bytesRequired > 0)
{
int bytesRead = fs.Read(signature, index, bytesRequired);
bytesRequired -= bytesRead;
index += bytesRead;
}
string actualSignature = BitConverter.ToString(signature);
if (actualSignature == expectedSignature)
return true;
else
return false;
}
}
}
For a complete list of files and the associated magic numbers, please see http://www.garykessler.net/library/file_sigs.html