Some Creativity

Weblog of Siddharth Uppal

Archive for April 8th, 2008

How to check if a file is compressed in C#

with 2 comments

Problem:

.NET 2.0 introduced the GZipStream class to allow programmatic compression and decompression of files. However that class doesn’t provide an easy way for us to determine if a file is actually a compressed file. Decompressing the entire file just to determine if a file is actually compressed is wasteful.

Solution:

Most files have a sequence of bytes in the beginning of the file dedicated to holding information that can help us identify the type of the file. A list of these “magic numbers” is available online.

Files that are compressed using GZIP compression algorithm begin with 1F 8B 08 (that’s the value of the first 3 bytes from the file, written in hexadecimal), while those compressed using PK-ZIP algorithm begin with “50 4B 03 04” (again, these are the first 4 bytes written as hexadecimal).

Code examples:

Before I show you the code, here’re some examples of how it can be used.

If “testFile.gz” is a valid Gzip compressed file, isGzip variable will be set to “true”:

bool isGzip = FileChecker.CheckSignature(“testFile.gz”, 3, “1F-8B-08″);

If “testFile.zip” is a valid PK-Zip compressed file, isPKZip variable will be set to “true”:

bool isPKZip = FileChecker.CheckSignature(“testFile.zip”, 4, “50-4B-03-04″);

Caveat:

Actually a binary file might start with the same “magic number” as normal PK-Zip files do and in that case FileChecker.CheckSignature will still return true (i.e. a false positive). But all valid PK-Zip files “will” have that magic-number in the beginning. So, a more accurate way to rephrase the examples would be:

If “testFile.gz” is not a valid Gzip compressed file, isGzip variable will be set to “false”:

bool isGzip = FileChecker.CheckSignature(“testFile.gz”, 3, “1F-8B-08″);

If “testFile.zip” is not a valid PK-Zip compressed file, isPKZip variable will be set to “false”:

bool isPKZip = FileChecker.CheckSignature(“testFile.zip”, 4, “50-4B-03-04″);

Code:

Anyways, here’s the code:


public static class FileChecker
{

    public static bool CheckSignature(string filepath, int signatureSize, string expectedSignature)
    {
        if (String.IsNullOrEmpty(filepath))
            throw new ArgumentException("Must specify a filepath");
        if (String.IsNullOrEmpty(expectedSignature))
            throw new ArgumentException("Must specify a value for the expected file signature");

        using (FileStream fs = new FileStream(filepath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
        {

            if (fs.Length < signatureSize)
                return false;

            byte[] signature = new byte[signatureSize];
            int bytesRequired = signatureSize;
            int index = 0;

            while (bytesRequired > 0)
            {
                int bytesRead = fs.Read(signature, index, bytesRequired);
                bytesRequired -= bytesRead;
                index += bytesRead;
            }

            string actualSignature = BitConverter.ToString(signature);

            if (actualSignature == expectedSignature)
                return true;
            else
                return false;
        }

    }
}

For a complete list of files and the associated magic numbers, please see http://www.garykessler.net/library/file_sigs.html

kick it on DotNetKicks.com

Written by Sid

April 8th, 2008 at 4:47 pm

Posted in .NET, General

Tagged with , ,