How to check if a file is compressed in C#

Posted on April 8, 2008

2


Problem:

.NET 2.0 introduced the GZipStream class to allow programmatic compression and decompression of files. However that class doesn’t provide an easy way for us to determine if a file is actually a compressed file. Decompressing the entire file just to determine if a file is actually compressed is wasteful.

Solution:

Most files have a sequence of bytes in the beginning of the file dedicated to holding information that can help us identify the type of the file. A list of these “magic numbers” is available online.

Files that are compressed using GZIP compression algorithm begin with 1F 8B 08 (that’s the value of the first 3 bytes from the file, written in hexadecimal), while those compressed using PK-ZIP algorithm begin with “50 4B 03 04” (again, these are the first 4 bytes written as hexadecimal).

Code examples:

Before I show you the code, here’re some examples of how it can be used.

If “testFile.gz” is a valid Gzip compressed file, isGzip variable will be set to “true”:

bool isGzip = FileChecker.CheckSignature(“testFile.gz”, 3, “1F-8B-08″);

If “testFile.zip” is a valid PK-Zip compressed file, isPKZip variable will be set to “true”:

bool isPKZip = FileChecker.CheckSignature(“testFile.zip”, 4, “50-4B-03-04″);

Caveat:

Actually a binary file might start with the same “magic number” as normal PK-Zip files do and in that case FileChecker.CheckSignature will still return true (i.e. a false positive). But all valid PK-Zip files “will” have that magic-number in the beginning. So, a more accurate way to rephrase the examples would be:

If “testFile.gz” is not a valid Gzip compressed file, isGzip variable will be set to “false”:

bool isGzip = FileChecker.CheckSignature(“testFile.gz”, 3, “1F-8B-08″);

If “testFile.zip” is not a valid PK-Zip compressed file, isPKZip variable will be set to “false”:

bool isPKZip = FileChecker.CheckSignature(“testFile.zip”, 4, “50-4B-03-04″);

Code:

Anyways, here’s the code:

public static class FileChecker
{

public static bool CheckSignature(string filepath, int signatureSize, string expectedSignature)
{
if (String.IsNullOrEmpty(filepath))
throw new ArgumentException("Must specify a filepath");
if (String.IsNullOrEmpty(expectedSignature))
throw new ArgumentException("Must specify a value for the expected file signature");

using (FileStream fs = new FileStream(filepath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{

if (fs.Length < signatureSize)
return false;

byte[] signature = new byte[signatureSize];
int bytesRequired = signatureSize;
int index = 0;

while (bytesRequired > 0)
{
int bytesRead = fs.Read(signature, index, bytesRequired);
bytesRequired -= bytesRead;
index += bytesRead;
}

string actualSignature = BitConverter.ToString(signature);

if (actualSignature == expectedSignature)
return true;
else
return false;
}

}
}

For a complete list of files and the associated magic numbers, please see http://www.garykessler.net/library/file_sigs.html

kick it on DotNetKicks.com

About these ads
Tagged: , ,
Posted in: .NET, General