Sunday, March 25, 2012

(Harder?) cube query & design question

Hi,

I have an MDX query (and worst case a cube design) problem that I haven't been able to solve, any ideas on how to go about this? Here's a simplified description, starting with the


Dimension & Attributes

* We make phone [Call]s.

* In each [Call], and for a number of [Product]s we ask a number of [Question]s.

* Each [Question] results in an [AnswerText]. These also reside in a user hierarchy [Answer Dimension].[Q and A].

* All of the above are attributes in the [Answer Dimension].


Facts

* Measures.[Answer Count], which at the granular level is always one, i.e. each fact records that we recieved a single [AnswerText] to a single [Question] about a single [Product] in a single [Call].

* Measures.[Call Count], which is the number of [Call]s made.


NB: We have thousands of different questions and answers, so surfacing each individual question and answer as a measure is not an option.


Queries & issues

* Counting the answers to a single particular question is easy:

SELECT
Measures.[Answer Count] ON 0
FROM cube
WHERE ([Answer Dimension].[Q and A].[Question].&[What color is it?].&[Blue])


* What I can't figure out is how to get Measures.[Answer Count] for multiple simultaneous questions, i.e.:

"For how many products and calls are &[What color is it?].&[Blue] AND &[What shape is it?].&[Round]"

I've tried (unsuccessfully) the following:

1) Various ways that equate to doing an intersection between the first and the second question. This fails since it returns the empty set - a single fact Measure.[Answer Count] only correspond to a single question.

2) Creating sets at the [Answer Dimension] leaf level where [Answer Count] = 1, and counting the number of tuples in the set. Although this looked promising it still failed me, the dimensionality either didn't allow combining the two questions or didn't slice the facts at all when say using a Filter() to combine the sets, even when using two different attribute to specify the answers.

3) Aggregating [Answer Count] to the set {[Call] * [Product]}, and Filter() where both questions have [Answer Count] >= 1. Again promising, but can't figure out the syntax to use.


Big Questions

* Is 2) or 3) above doable at all? What is the rough syntax needed?


* Is there a better (working!-) way to query this cube?


* Is there a better way to design the cube for answering these types of combined questions (remembering we have thousands of distinct questions and answers, and new ones get added over time, and a total of millions of facts)?

Any and all suggestions Much Appreciated!

Kristian

Very complex problem. My question is simply if you have tried data mining and a decision tree model on this problem? Is DM not an option?

Regards

Thomas Ivarsson

|||

Assuming that the question: "For how many products and calls are &[What color is it?].&[Blue] AND &[What shape is it?].&[Round]" refers to counting product/call combinations, cascading NonEmpty() might compute what you're looking for, like:

Count(NonEmpty(NonEmpty({[Call].[Call].[Call] * [Product].[Product].[Product]},

{[Answer Dimension].[Q and A].[Question].&[What color is it?].&[Blue]}),

{[Answer Dimension].[Q and A].[Question].&[What shape is it?].&[Round]}))

|||Yes! This gives the right total on my mini test cube. One (hopefully simple) follow-up question:

The calculation now happens at the right

[Call].[Call].[Call] * [Product].[Product].[Product]

level. How do I write the query so that I get the Count in a colum, and the list of Products on the rows? I.e.

Occurrences
Car 2
Bowl 1

For instance, this won't work:

WITH
SET MySet AS
[... Deepak's code from above ...]
MEMBER Measures.Occurrences AS
Count(MySet)
SELECT
[Product].[Product].[Product] ON 0
FROM cube

since it will give all Products the same total. Ideas?

Many thanks!

Kristian

|||

Hi Kristian,

To your query above, maybe you can add "Existing", to select only relevant tuples for each cell context:

MEMBER Measures.Occurrences AS
Count(Existing MySet)

|||Many thanks Deepak, EXISTING works for small sets, except:

When using large sets, the SSAS service crashes when running the above query. Any ideas on how to make the above query not crash the service, either through increasing limits on the SSAS instance or optimizing the query itself, any ideas?

Kristian|||

Kristian, unfortunately I haven't done much tweaking of memory settings for AS 2005 - maybe someone else can comment on this?

One alternative, which may save memory but be much slower, is to avoid creating the cross-joined named set. So, the Occurrences calculated measure could be directly defined as:

Count(NonEmpty(NonEmpty({(Existing [Call].[Call].[Call]) * (Existing [Product].[Product].[Product])},

{[Answer Dimension].[Q and A].[Question].&[What color is it?].&[Blue]}),

{[Answer Dimension].[Q and A].[Question].&[What shape is it?].&[Round]}))

sql

No comments:

Post a Comment