lundi 29 juin 2015

Sequential vs parallel solution memory usage

I have a slight issue with the following scenario: I'm given a list of ID values, I need to run a SELECT query (where the ID is a parameter), then combine all the result sets as one big one and return it to the caller.

Since the query might run for minutes per ID (that's another issue, but at the moment I consider it as a given fact), and there can be 1000s of IDs in the input) I tried to use tasks. With that approach I experience a slow, but solid increase in memory use.

As a test, I made a simple sequential solution too, this has normal memory usage graph, but as expected, very slow. There's an increase while it's running, but then everything drops back to the normal level when it's finished.

Here's the skeleton of code:

public class RowItem
{
    public int ID { get; set; }
    public string Name { get; set; }
    //the rest of the properties
}


public List<RowItem> GetRowItems(List<int> customerIDs)
{
    // this solution has the memory leak
    var tasks = new List<Task<List<RowItem>>>();
    foreach (var customerID in customerIDs)
    {
        var task = Task.Factory.StartNew(() => return ProcessCustomerID(customerID));
        tasks.Add(task);
    }

    while (tasks.Any())
    {
        var index = Task.WaitAny(tasks.ToArray());
        var task = tasks[index];
        rowItems.AddRange(task.Result);
        tasks.RemoveAt(index);
    }

    // this works fine, but slow
    foreach (var customerID in customerIDs)
    {
        rowItems.AddRange(ProcessCustomerID(customerID)));
    }

    return rowItems;
}

private List<RowItem> ProcessCustomerID(int customerID)
{
    var rowItems = new List<RowItem>();
    using (var conn = new OracleConnection("XXX"))
    {
        conn.Open();
        var sql = "SELECT * FROM ...";
        using (var command = new OracleCommand(sql, conn))
        {
            using (var dataReader = command.ExecuteReader())
            {
                using (var dataTable = new DataTable())
                {
                    dataTable.Load(dataReader);
                    rowItems = dataTable
                               .Rows
                               .OfType<DataRow>()
                               .Select(
                                   row => new RowItem
                                   {
                                       ID = Convert.ToInt32(row["ID"]),
                                       Name = row["Name"].ToString(),
                                       //the rest of the properties
                                   })
                               .ToList();
                }
            }
        }
        conn.Close();
    }
    return rowItems;
}

What am I doing wrong when using tasks? According to this MSDN article, I don't need to bother disposing them manually, but there's barely anything else. I guess ProcessCustomerID is OK, as it's called in both variations.

update To log the current memory usage I used Process.GetCurrentProcess().PrivateMemorySize64, but I noticed the problem in Task Manager >> Processes

Aucun commentaire:

Enregistrer un commentaire