SQL,tables in a DB: design

Showing posts with label design. Show all posts

Sunday, March 25, 2012

(Harder?) cube query & design question

Hi,

I have an MDX query (and worst case a cube design) problem that I haven't been able to solve, any ideas on how to go about this? Here's a simplified description, starting with the

Dimension & Attributes

* We make phone [Call]s.

* In each [Call], and for a number of [Product]s we ask a number of [Question]s.

* Each [Question] results in an [AnswerText]. These also reside in a user hierarchy [Answer Dimension].[Q and A].

* All of the above are attributes in the [Answer Dimension].

Facts

* Measures.[Answer Count], which at the granular level is always one, i.e. each fact records that we recieved a single [AnswerText] to a single [Question] about a single [Product] in a single [Call].

* Measures.[Call Count], which is the number of [Call]s made.

NB: We have thousands of different questions and answers, so surfacing each individual question and answer as a measure is not an option.

Queries & issues

* Counting the answers to a single particular question is easy:

SELECT
Measures.[Answer Count] ON 0
FROM cube
WHERE ([Answer Dimension].[Q and A].[Question].&[What color is it?].&[Blue])

* What I can't figure out is how to get Measures.[Answer Count] for multiple simultaneous questions, i.e.:

"For how many products and calls are &[What color is it?].&[Blue] AND &[What shape is it?].&[Round]"

I've tried (unsuccessfully) the following:

1) Various ways that equate to doing an intersection between the first and the second question. This fails since it returns the empty set - a single fact Measure.[Answer Count] only correspond to a single question.

2) Creating sets at the [Answer Dimension] leaf level where [Answer Count] = 1, and counting the number of tuples in the set. Although this looked promising it still failed me, the dimensionality either didn't allow combining the two questions or didn't slice the facts at all when say using a Filter() to combine the sets, even when using two different attribute to specify the answers.

3) Aggregating [Answer Count] to the set {[Call] * [Product]}, and Filter() where both questions have [Answer Count] >= 1. Again promising, but can't figure out the syntax to use.

Big Questions

* Is 2) or 3) above doable at all? What is the rough syntax needed?

* Is there a better (working!-) way to query this cube?

* Is there a better way to design the cube for answering these types of combined questions (remembering we have thousands of distinct questions and answers, and new ones get added over time, and a total of millions of facts)?

Any and all suggestions Much Appreciated!

Kristian

Very complex problem. My question is simply if you have tried data mining and a decision tree model on this problem? Is DM not an option?

Regards

Thomas Ivarsson

|||

Assuming that the question: "For how many products and calls are &[What color is it?].&[Blue] AND &[What shape is it?].&[Round]" refers to counting product/call combinations, cascading NonEmpty() might compute what you're looking for, like:

Count(NonEmpty(NonEmpty({[Call].[Call].[Call] * [Product].[Product].[Product]},

{[Answer Dimension].[Q and A].[Question].&[What color is it?].&[Blue]}),

{[Answer Dimension].[Q and A].[Question].&[What shape is it?].&[Round]}))

|||Yes! This gives the right total on my mini test cube. One (hopefully simple) follow-up question:

The calculation now happens at the right

[Call].[Call].[Call] * [Product].[Product].[Product]

level. How do I write the query so that I get the Count in a colum, and the list of Products on the rows? I.e.

Occurrences
Car 2
Bowl 1

For instance, this won't work:

WITH
SET MySet AS
[... Deepak's code from above ...]
MEMBER Measures.Occurrences AS
Count(MySet)
SELECT
[Product].[Product].[Product] ON 0
FROM cube

since it will give all Products the same total. Ideas?

Many thanks!

Kristian

|||

Hi Kristian,

To your query above, maybe you can add "Existing", to select only relevant tuples for each cell context:

MEMBER Measures.Occurrences AS
Count(Existing MySet)

|||Many thanks Deepak, EXISTING works for small sets, except:

When using large sets, the SSAS service crashes when running the above query. Any ideas on how to make the above query not crash the service, either through increasing limits on the SSAS instance or optimizing the query itself, any ideas?

Kristian|||

Kristian, unfortunately I haven't done much tweaking of memory settings for AS 2005 - maybe someone else can comment on this?

One alternative, which may save memory but be much slower, is to avoid creating the cross-joined named set. So, the Occurrences calculated measure could be directly defined as:

Count(NonEmpty(NonEmpty({(Existing [Call].[Call].[Call]) * (Existing [Product].[Product].[Product])},

{[Answer Dimension].[Q and A].[Question].&[What color is it?].&[Blue]}),

{[Answer Dimension].[Q and A].[Question].&[What shape is it?].&[Round]}))

sql

Thursday, March 22, 2012

(Design) Production db used as the data warehouse?

We're designing our first bi suite and we're considering not using a data warehouse at all but connecting directly to the production database via the DS. I have a nagging feeling that inherently this is not a good idea but would welcome pros and cons.

Pros
- real time data updates as fast as we can Process
- no need for ETL
- we can write named queries for use in the DSV to satisfy our data requirements

Cons
- performance hit to non BI users when BI users report if using ROLAP partitions
- potential table locking during Processing, again affecting non BI users

My gut feel is that instead we should be keeping a synchronized copy of Production as a DW to report off, but I will need stronger Cons that those above to convince my boss.

Be gentle please - my first post and I completed my fist SSAS course only yesterday. Thanks.

Are your production db home made or purchased solution from sombody?

Is the data quality 100% so you don't need data clearing and validation?

Are there any updates/upgrades of production db structure?

Are you sure, that you production db is only source of your data warehouse and there will be no another sources in next years?

How large is the volume of your production db?

How large it will be in 3-5 years?

Do you plan to archive some old transaction data from your production db?

|||

Thanks Vlad, some good points here:

Are your production db home made or purchased solution from sombody?

- The production database is also ours.

Is the data quality 100% so you don't need data clearing and validation?

- Yes, and any changes that need to be made through the interfaces would be made in the source (production) database (which would then flow through).

Are there any updates/upgrades of production db structure?

Cheers I hadn't considered this - if we change the DDL of the production database during version upgrades then this will potentially have an affect on the DSV's.

Are you sure, that you production db is only source of your data warehouse and there will be no another sources in next years?

At this stage yes - although if it's very successful I could see clients wanting to report on other external sources like their upstream ERP's. I'm not sure how having the production database as the data warehouse could relate to this though...

How large is the volume of your production db?

Our largest production databases are about 5GB.

How large it will be in 3-5 years?

Hmm, hard to say but would guess 10GB?

Do you plan to archive some old transaction data from your production db?

Very good point - for customers who have been using the software for some time and have a large amount of history they may want to use the DW as an archive and remove data from production - the approach I outlined about does not allow this easily (we would have to write a number of date queries restricting the data.

Considering the lukewarm response I got to this thread perhaps what is being proposed is not such a big deal/poor option - if I consider the cons above it sounds like we may be able to implement it in such a way. I must admit I still have a bad get feel about it but can't really pin down exactly why...

Has anyone seen cubes based directly off their production database (i.e. not a copy of it) in a real world environment? Would love some more feedback...

|||

Hi,

I have seen many SSAS solutions, and a some of them direct connected to production db. but no one of direct connected was ERP db.

I depends on your businnes. Is you production db a ERP like db, or what else? If ERP like, then you must have DWH, if you what to sleep relaxed :-) another way you get enough headache

|||It's an OLTP database but not high volume, only dozens of transactions an hour. But we're thinking of only processing nightly, and setting an expectation with the users that this will be the case.

Sunday, February 19, 2012

<Never mind, self-ansered> DMF - is it by design that database can't subscribe to Def

If by design, I can't really understand why we would not allow that...

[EDIT]

Answered this myself... BOL says:

Declarative Management Framework policy group
A user-defined group to help manage policies. Users can classify policies into different policy groups. A policy belongs to one and only one policy group. Database owners can subscribe a database to a set of policy groups. Only policies from its subscribed groups can govern a database. All databases implicitly subscribe to the default policy group.

There's a specific reason for this. The Policy Administrator may decide that all database on the system must adhere to a group of policies. To enable this these policies are placed int he default group. Furthermore, there may a group of policies that only apply to certain databases but the Policy Administrator doesn't "know" which database. Therefore, the Policy Adminsitrator can leave it up to the database owner to select which other groups (other than default) apply to their particular database. The database owner does this by subscribing to these other groups.

Cheers,
Dan

|||

Thanks.

FYI:

I will file a connect bug to request that any database shows subscribttion to Default group (Right-click a database -> Policies -> Group Subscriptions).

Of course, it should be impossible to unsubscribe.

This will visually represent that all databases are implicitly subscribe to Default policy and will remove any potential confusion (like I encountered initially.)

|||

Hi Alexey, We originally showed it and thought it was confusing. Looks like we were wrong...

So yes, please file the bug.

Cheers,

Dan

|||

Hm... If this is also found confusing, 2 middle-ground options:

Do show the default policy and change "Group Subscriptions:" wording of the dialog to "Group Subscriptions, including implicit subscription to Default policy:" Do not show the default policy, and add at the bottom (just above Ok/Cancel):
Note:all databases also implicitly subscribe to Default policy.|||

Hi Alexey,

Let's take this offline.

Thanks,

Dan

Sunday, March 25, 2012

(Harder?) cube query & design question

Thursday, March 22, 2012

(Design) Production db used as the data warehouse?

Sunday, February 19, 2012

<Never mind, self-ansered> DMF - is it by design that database can't subscribe to Def

SQL,tables in a DB

Blog Archive

About Me