NHibernate - Getting up and running

Querying Collections Efficiently with NHibernate

Index

Introduction to NHibernate
Simple Queries on a single table with NHibernate
Configuring Child Collections with NHibernate
Overriding Equals, GetHashCode and Database Design with NHibernate
Querying Collections Efficiently with NHibernate
Persisting objects back to the database with NHibernate

Introduction

As there was with configuring collections, there is a lot with querying them to. This chapter will explain the best practices for querying child collections, detailing the infamous SELECT N+1 anti-pattern and how to avoid it. Poorly constructed queries can (and do) cripple applications.

Poorly constructed queries are often overlooked at build / release as there is no real load on the application. Its only once the application is being used in anger that problems start to arise.

The SELECT N+1 anti-pattern

This is a common problem with ORM's, which can be basically described as the ORM issuing a lot more queries to the database than is immediately obvious. The code works, its just extremely inefficient. The unit test below demonstrates the problem.

        [Fact]
        public void LazilyLoadingEmployeeOrders() {
            IList employees = null;


            Assert.DoesNotThrow(() => {
                using (var tx = Session.BeginTransaction()) {
                    employees = Session.QueryOver()
                        .WhereRestrictionOn(x => x.EmployeeId)
                            .IsIn(new int[] { 1, 2, 3, 4 })
                        .List();

                    tx.Commit();
                }

                decimal totalFrieght = 0;
                foreach (var employee in employees)
                    foreach (var order in employee.Orders)
                        totalFrieght += order.Freight;
            });
        }

The test passes, but as the employee orders are lazily loaded as an additional statement is issued for each employee outside the transaction (when the ... in employee.Orders) statement is executed inside the nested loop.

Demonstrating the SELECT N + 1 anti pattern in the Profiler

The SELECT N refers to the number of rows selected in the parent of the collection. For each row, an additional statement is going to be issued. The +1 refers the the original query. With a small amount of data, this will have no noticeable effect, as the volume increases... things start to go downhill rapidly.

Eagerly loading child collections

This anti-pattern can be avoided by eagerly loading the child collections, as always, NHibernate offers several ways of doing this.

Editing the mapping file

The easiest way is to edit the mapping file adding lazy="false" to the set.

    <set name="Orders" inverse="true" cascade="all-delete-orphan" lazy="false">
      <key column ="EmployeeId" />
      <one-to-many class ="Order" />
    </set>

This will force the collection to be eagerly loaded. Running one of the first unit tests, shows that even a Session.Get<T> will eagerly load the orders.

        [Fact]
        public void CanGetEmployeeById() {
            Employee e = null; 

            using (var tx = Session.BeginTransaction()) {
                e = Session.Get(4);
                tx.Commit();
            }

            Assert.NotNull(e);
            Assert.Equal("Peacock", e.LastName);
            Assert.Equal("Margaret", e.FirstName);
        }
Eagerly loading orders with lazy to set false

This seems like an easy fix, and the temptation is to eagerly load everything, but this too will cause performance issues. The majority of the time you may not care about the orders, for such things as head count reports, logging in, etc... Having this unnecessary overhead hard coded into the mapping files is not recommended. Eager loading should be used only when you need the data, by queries authored specifically for the task at hand.

Fetching collections

If there is no requirement to filter the child collections a fetch join will eagerly load the collection data as shown in the test below. I have also included the older Criteria API (commented out) which achieves the same. It's good to be across this syntax for supporting legacy systems. The major advantage of the QueryOver API is the lambda expressions remove the need for strings, allowing for compile time error checking.

        [Fact]
        public void EagerLoadingEmployeeOrders() {
            IList<Employee> employees = null;

            using (var tx = Session.BeginTransaction()) {
                employees = Session.QueryOver<Employee>()
                    .WhereRestrictionOn(x => x.EmployeeId)
                        .IsIn(new int[] { 1, 2, 3, 4 })
                    .Fetch(x => x.Orders).Eager 
                    .List();

                tx.Commit();
            }

            //using (var tx = Session.BeginTransaction()) {
            //    employees = Session.CreateCriteria<Employee>()
            //        .SetFetchMode("Orders", FetchMode.Join)
            //        .Add(Expression.In("EmployeeId", new int[] { 1, 2, 3, 4 }))
            //        .List<Employee>();
            //    tx.Commit();
            //}


            decimal totalFrieght = 0;
            foreach (var employee in employees)
                foreach (var order in employee.Orders)
                    totalFrieght += order.Freight;

        }

The resulting SQL is a left outer join, selecting all fields from both tables.

Demonstrating the fetch join in the Profiler

Inner and Outer Joins

If you want to filter the results of a child collection, you'll have to use a join to supply criteria to. There's a gotcha though. You have to use a left join for the filter to be applied, as the test below demonstrates.

        [Fact]
        public void ApplyingCriteriaOnChildCollections() {
            Employee employee;

            using (var tx = Session.BeginTransaction()) {

                employee = Session.QueryOver<Employee>()
                    .Where(x => x.EmployeeId == 2)
                    .JoinQueryOver<mdl.Order>(x => x.Orders, JoinType.InnerJoin)
                    .Where(o => o.ShippedDate == null)
                    .SingleOrDefault();
                    
                //employee = Session.CreateCriteria<Employee>()
                //    .Add(Expression.Eq("EmployeeId", 2))
                //    .CreateCriteria("Orders", JoinType.InnerJoin)                  
                //    .Add(Expression.IsNull("ShippedDate"))
                //    .UniqueResult<Employee>();

                Assert.Equal(96, employee.Orders.Count);
                Session.Evict(employee);


                employee = Session.QueryOver<Employee>()
                    .Where(x => x.EmployeeId == 2)
                    .JoinQueryOver<mdl.Order>(x => x.Orders, JoinType.LeftOuterJoin)
                    .Where(o => o.ShippedDate == null)
                    .SingleOrDefault();

                //employee = Session.CreateCriteria<Employee>()
                //    .Add(Expression.Eq("EmployeeId", 2))
                //    .CreateCriteria("Orders", JoinType.LeftOuterJoin)
                //    .Add(Expression.IsNull("ShippedDate"))
                //    .UniqueResult<Employee>();
            }

            Assert.Equal(3, employee.Orders.Count);
        }

Even though, the criteria is correctly applied, NHibernate issues a second statement to completely populate the child collection. This can be confusing and frustrating. The 3rd Statement populates the object graph as expected, with 3 orders.

Demonstrating filtering on inner joins in the Profiler

Summary

The key points to take away from the above are

  • Always profile your queries looking for SELECT N+1 occurrences
  • Leave lazy loading on by default, do not disable this in the mapping files
  • Use fetch when there is no requirement for filtering child collections
  • Use left join to filter child collections

Source Code

git clone https://github.com/mindfulsoftware/ORM.Nhibernate.git
(version: 8b5af698bcc519a26310ccef780501e750bc5eab)