Sunday, November 28, 2004

363.aspx

.NET CF DataSet performance using XML, text and CSV

I want a fast storage for my secrets that is easy to synchronize with a PC. The obvious choice would be a DataSet serialized to an XML file. It's fast on a PC but my SMS Manager slows down on the Pocket PC when the DB grows. The password manager will be used for reading in 99% of the case so I set up a simple test suite on the PPC to test the performance of the different file formats I was considering:



  • Text array: reads a line and does a .Split() using tab as the separator. Creates a  2 dimensional array (rows/fields in row)
  • CSV array: parses CSV file and creates a  2 dimensional array (rows/fields in row)
  • XML dataset: uses the ReadXml() method of the DataSet object
  • CSV dataset: parses a CSV file and builds a DataSet in memory


Example test routine (XML DataSet):


openFileDialog.Filter = "XML Test|*.xml";
if (DialogResult.OK == openFileDialog.ShowDialog())
{
    string fileName = openFileDialog.FileName;
    int startTick = Environment.TickCount;        
 
    System.Data.DataSet ds = new System.Data.DataSet();
    ds.ReadXml(fileName);            
    int ticks = Environment.TickCount - startTick;
    System.Windows.Forms.MessageBox.Show("Time taken: " + 
            ticks + " ms");
}

I know that DataSets serialized to XML are slow on .NET Compact Framework but I had no idea they were this slow:


The test were run with 1.000 records on a H3870.  I repeated the tests with 100, 1.000 and 10.000 records with similar results.



I find it strange that my CSV version is almost 3 times faster than the text version that does a simple Split(). This is the Text reader core:


StreamReader sr = File.OpenText(fileName);
String input;
while ((input = sr.ReadLine()) != null )
{
    rows.Add (input.Split(fieldSeparators));
}
sr.Close();

The text version is slightly faster than the CSV version the first time it is run (not shown in my graphs). I guess this is because the String class is pre-jitted.
I have decided use the CSV DataSet for several reasons:



  • It gives me all the features of DataSets I would otherwise have to implement myself for the array versions: sort, filter, search
  • It has less start up overhead for the first call (728 ms vs 2.218 ms for XML)
  • It has acceptable performance up to 10.000 records (6.303 ms vs 36.513 ms for XML)


I will play with encryption support next.

3 comments:

  1. You should try binary serialization, it will take up less storage space on the device and is a lot faster, unless you have some reason to make the dataset file human readable. But you say you want to encrypt it, so I guess not.

    ReplyDelete
  2. Hi,



    I realize this thread was posted over a year ago. But I do have a question regarding your CSV dataset test.



    You chose to go with the CSV dataset. You mention your times for 10,000 records for CSV vs XML. What do these times indicate? How long it took to load the 10,000 records into the dataset?



    I'm very curious because I'm taking a huge performance hit when trying to load just 4,000 records via a Dataset.ReadXml. This is not acceptable and I'm searching for better ways to get this data into the app.



    I would appreciate ANY info you could give me regarding your performance testing.



    Thanks,



    Jason

    ReplyDelete
  3. Thanks for the update Egil. Very much appreciated.



    I dug around my code and was able to add a few tweaks. Somehow, the schema was omitted in my XML file. I re-added it and that alone shaved about 25 secs. off the load. And through some further testing, I have now determined the bottleneck is on the saving of the dataset data into the SQL CE database.



    Currently, I'm using a ds.ReadXML and it's loading the 4600+ records into memory in less than 3 secs. So I'm not going to complain about that performance. But where the speed takes the big hit, is after the XML data is loaded into memory, it has to be saved into the SQL CE database with the rest of the data. After the XML load, it takes 2-2.5 minutes to load the 4600+ records into the database.



    My XML file has a total of 6 fields. 1 integer and 5 x 60 character (varchar) values. I'm now using CF 1.0 SP3 running on a PPC 2003 device.



    I will definitely have a look at your poSecrets code because over time this 1 XML file is going to grow and will continue to grow. So I would like to get the best performing load possible now. And the encryption idea is appealing too to keep the corporate data somewhat from prying eyes.



    Thanks again for your benchmarks. Now back to further testing. :)



    ReplyDelete