Thursday, February 24, 2011

SSAS::::MDX Optimization Techniques: Segregating DISTINCT COUNT

Initial Approach via MDX

Let’s initialize the MDX Sample Application, the platform from which we perform many practice exercises within the articles of our series. (We choose it because any organization that has installed MSAS has access to the Sample Application). We will create our initial query by taking the following steps:

1. Start the MDX Sample Application.

We are initially greeted by the Connect dialog, shown in Figure 1.

Figure 1: The Connect Dialog for the MDX Sample Application

The figure above depicts the name of my server, MOTHER1, and properly indicates that we will be connecting via the MSOLAP provider (the default).

2. Click OK.

The MDX Sample Application window appears.

3. Click File ---> New.

A blank Query pane appears.

4. Ensure that FoodMart 2000 is selected as the database name in the DB box of the toolbar.

5. Select the Sales cube in the Cube drop-down list box.

The MDX Sample Application window should resemble that depicted in Figure 2, complete with the information from the Sales cube displaying in the Metadata tree (left section of the Metadata pane).

Figure 2: The MDX Sample Application Window (Compressed View)

We will begin creating our query with a focus on returning results to meet the expressed business need of the information consumers. We will construct two calculated members / measures, one to contain the distinct count of the Customers, and one to calculate the average sale for each Product Category, per individual Customer. We will then SELECT the two calculated measures, presenting them to the immediate right of the Unit Sales measure for each respective Product Category.

We will retrieve a dataset with the measure / calculated measures forming the column axis, and the Product Category forming the row axis.

1. Create the following new query:

-- SSP11-1 Initial Attempt at Distinct Customer Dataset
   [Measures].[Distinct Customers]
      'COUNT(CrossJoin({[Unit Sales]},

   Descendants ([Customers].CurrentMember,

  [Customers].[Name])), ExcludeEmpty)'

   [Measures].[Avg Sales per Customer]

   '[Measures].[Unit Sales]/[Measures].[Distinct Customers]'


   {[Measures].[Unit Sales], [Measures].[Distinct Customers], [Measures].

   [Avg Sales per Customer]} on COLUMNS,

   {[Product Category].Members} ON ROWS





The above represents an attempt to meet the information consumers’ objectives with what appears to be the straightforward use of the DISTINCTCOUNT() function within a calculated member, to contain the count of the distinct Customers; we then create a second calculated member based upon the first, which we divide into the Unit Sales measure to derive the Average Sales per (individual) Customer, as requested by the intended audience. We SELECT all three into the desired matrix to render the desired presentation

The calculated member Distinct Customers embodies the heavy lifting in the query. We used the following definition (within the AS clause string for calculated member Distinct Customers):

'COUNT(CrossJoin({[Unit Sales]},

   Descendants ([Customers].CurrentMember,

   [Customers].[Name])), ExcludeEmpty)'

to count the non-null Sales / Customer member tuples that it found, thereby deriving the number of customers. Because we wish to avoid counting all customer names (the lowest level of the Customer hierarchy), regardless of our level position in the hierarchy, we inserted the Descendants() function shown; this forces a limitation upon the count to solely the customers under the current member of the Customers dimension.

2. Execute the query using the Run Query button.

The results dataset appears as partially shown in Figure 3.

Figure 3: The Results Dataset (Partial View)

The first thing that we notice, after clicking Run Query, is that this query takes a little longer to run than many of the “sample” queries we have created in past articles. As a matter of fact, this is exactly the observation that I am hoping even those new to MDX in general will make. They query provides the data to meet the requirements of the information consumers, but performance could become a problem.

The overhead generated in the query is due to the requirement for MSAS to perform a runtime assessment of each customer member, and there are many members. While this overhead may not be unduly troublesome from the perspective of our sample data, performance will be degraded far more within the context of the sizes of member populations that exist in many production environments: the performance degradation we have witnessed in our tiny sample cube is extrapolated to those larger populations, to an extent that becomes very real to analysts and other information consumers that rely upon the system to provide data in a reasonable response time.

3. Save the query as SSP11-1 in a convenient location.

4. Close the Sample Application.

Our next step will be to examine an option for mitigating the performance hit suffered within the "straightforward" application of DISTINCTCOUNT() within our query

No comments:

Post a Comment