.NET CF DataSet performance using XML, text and CSV
I want a fast storage for my secrets that is easy to synchronize with a PC. The obvious choice would be a DataSet serialized to an XML file. It's fast on a PC but my SMS Manager slows down on the Pocket PC when the DB grows. The password manager will be used for reading in 99% of the case so I set up a simple test suite on the PPC to test the performance of the different file formats I was considering:
- Text array: reads a line and does a .Split() using tab as the separator. Creates a 2 dimensional array (rows/fields in row)
- CSV array: parses CSV file and creates a 2 dimensional array (rows/fields in row)
- XML dataset: uses the ReadXml() method of the DataSet object
- CSV dataset: parses a CSV file and builds a DataSet in memory
Example test routine (XML DataSet):
openFileDialog.Filter = "XML Test|*.xml";
if (DialogResult.OK == openFileDialog.ShowDialog())
{
string fileName = openFileDialog.FileName;
int startTick = Environment.TickCount;
System.Data.DataSet ds = new System.Data.DataSet();
ds.ReadXml(fileName);
int ticks = Environment.TickCount - startTick;
System.Windows.Forms.MessageBox.Show("Time taken: " +
ticks + " ms");
}
I know that DataSets serialized to XML are slow on .NET Compact Framework but I had no idea they were this slow:

The test were run with 1.000 records on a H3870. I repeated the tests with 100, 1.000 and 10.000 records with similar results.
I find it strange that my CSV version is almost 3 times faster than the text version that does a simple Split(). This is the Text reader core:
StreamReader sr = File.OpenText(fileName);
String input;
while ((input = sr.ReadLine()) != null )
{
rows.Add (input.Split(fieldSeparators));
}
sr.Close();
The text version is slightly faster than the CSV version the first time it is run (not shown in my graphs). I guess this is because the String class is pre-jitted.
I have decided use the CSV DataSet for several reasons:
- It gives me all the features of DataSets I would otherwise have to implement myself for the array versions: sort, filter, search
- It has less start up overhead for the first call (728 ms vs 2.218 ms for XML)
- It has acceptable performance up to 10.000 records (6.303 ms vs 36.513 ms for XML)
I will play with encryption support next.