Compress a string with zip

Posted by Marcus Hammarberg on December 15, 2008
Stats

[UPDATE START]

We found a nasty performance bug in the code below. The DeCompress-method copies a string for each turn in the loop. That is a classic problem that creates a new copy of the string for each row. That became a major problem for a 3.8 Mb string...

I have now updated the code to use the System.Text.StringBuilder object instead. That took down the speed to about a tenth. Sorry that I didn't catch that...

[UPDATED STOP]

We had a quite special need the other day; we wanted to compress a part of our request, namely a XML-string that was sent to us.

Most of the examples I found on the net showed how to compress the content of a file. But here is the code that compress a string. The code uses ICSharpCode.SharpZipLib. Here you go:

/// <summary>
/// Compress strings with ICSharpCode.SharpZipLib
/// </summary>
public class StringZipper
{
public static string Compress(string uncompressedString)
{
byte[] bytData = System.Text.Encoding.UTF8.GetBytes(uncompressedString);
MemoryStream ms = new MemoryStream();
Stream s = new DeflaterOutputStream(ms);
s.Write(bytData, 0, bytData.Length);
s.Close();
byte[] compressedData = (byte[])ms.ToArray();
return System.Convert.ToBase64String(compressedData, 0, compressedData.Length);
}

public static string DeCompress(string compressedString)
{
StringBuilder uncompressedString = new StringBuilder();
int totalLength = 0;
byte[] bytInput = System.Convert.FromBase64String(compressedString); ;
byte[] writeData = new byte[4096];
Stream s2 = new InflaterInputStream(new MemoryStream(bytInput));
while (true)
{
int size = s2.Read(writeData, 0, writeData.Length);
if (size > 0)
{
totalLength += size;
uncompressedString.Append(System.Text.Encoding.UTF8.GetString(writeData, 0, size));
}
else
{
break;
}
}
s2.Close();
return uncompressedString.ToString();
}
}

And here is how to call it:

// Compress the string
string orgString = "Hello compression with åäö.";
//orgString = File.ReadAllText(args[0]);

Stopwatch sw = new Stopwatch();
sw.Start();
string compressedString = StringZipper.Compress(orgString);
sw.Stop();
string compressTime = sw.ElapsedMilliseconds.ToString();

// Decompress the string
sw.Reset(); sw.Start();
string decompressedString = StringZipper.DeCompress(compressedString);
sw.Stop();
string decompressTime = sw.ElapsedMilliseconds.ToString();

if (orgString == decompressedString)
Console.WriteLine("SAME");
else
Console.WriteLine("DIFFER");

// Show some nice statistics
decimal ratio = (decimal)compressedString.Length / (decimal)orgString.Length;
Console.WriteLine(string.Format("Orignal string was: {0} chars", orgString.Length));
Console.WriteLine(string.Format("Compressed string was: {0} chars", compressedString.Length));
Console.WriteLine(string.Format("DeCompressed string was: {0} chars ", decompressedString.Length));
Console.WriteLine(string.Format("Compression ratio {0}", ratio));
Console.WriteLine(string.Format("The compression took {0} ms, DeCompression took {1} ms", compressTime, decompressTime));

I should point out two things;

  • For short strings the compressed string might actually be longer than the original.
  • It's not very good to store the result of the compression as a string (as done by System.Convert.ToBase64String), but it's quite nice to have for serialization.


Published by Marcus Hammarberg on Last updated