Wednesday, October 29, 2014

PASS Summit 2014 goals - crowd source solving a problem

I've not really done this before. They've usually been rattling around in my head though.
I tend to attend the conference and seek out folks that i know can add to my knowledge store or help resolve issues i am having currently.

I'm going to describe some of them here in this post, and hope that you, dear reader, can assist me with tracking them down.

I am not a BI guy, but I would like to become more of one.

One of the current projects we are working on is getting a cloud based version of our application to work well. It is the basic application we have now, only instead of getting data from an on premise sql server, its gonna reach into an azure blob storage and retrieve a json document. This rich document contains all the data we normally have, without the constraint of structure in an rdbms. But it will mimic this structure. So there will be a document describing a student, and within that, there will be names and the like. Another document within this json doc will have scores and history and other descriptive data detailing what this student has done. Think 20 tables with 10-15 fields per table, and many rows of data (or documents in this case). These json docs will grow and grow with use, as more data is added.

How do I get this data out of these json documents, and into a system where others (internal, external, application, services) can get to the data? Think for reporting. Each json doc is a single student. Maybe i need to summarize all the students from an entire region and gain insight on something they have done. I would assume that these individual rich json docs will need to be extracted to some other structure, and transformed and loaded to a storage system, be it an rdbms, or hadoop cluster, or some other magical solution. Maybe a data warehouse. Maybe a SQL Server. Maybe a MondoDB store.

What do you think it should be? How best to create this process of extracting this data and presenting it to others in a reportable fashion?

So if you are still reading this, and have an opinion on this above issue, come grab me and let's talk about it. I'd love to hear your take on this, and experience, and possible direction.

With this in mind, there are a slew of sessions that are BI related that I am going to try hard to attend this year. I have flagged some of them and hope to get some insight there, as well as with other groups like SQLCAT and the other forums and opportunities that the PASS Summit offers. I'll even sit down with vendors and spell this out in the hopes that they have a magic bullet or at least a suggestion of direction.

I suggest that you too bring issues, real issues, to the summit and attempt to get them solved. At the very least, by you talking through the issue with others, you will discover things along the way. Maybe you will solve it by yourself, or maybe you'll get put in touch with that solitary individual in the world that has already solved it and is willing to train you on the process. Or something in between. Either way, its better than sitting at your desk making it up yourself.

Get out and get your lurn on at the summit this next week and enjoy yourself.


Rick Krueger said...

In case I forget, remember to ask me how we did it on a project where we used polyglot persistence.

Brent Ozar said...

Here's the design questions I'd consider about the user first:

The user who's asking questions about the data - do they want to see the same questions answered over and over, or do they want to come up with new questions on demand? (Think scheduled weekly report delivery via email, versus asking a question once and then never asking it again.)

Do they have expertise about the data structures already? (For example, can they identify which structures hold the data they're seeking, or do they need tutorials about how the data is joined together and which data lives in which fields?)

How much time does the end user have to pose the question? (Is it an analyst who can take several hours to test the formation of their question, or is it an executive who has five minutes tops to get attendance by class by region?)

How many data consumers are involved, and what are their skill levels with tools? (Are they analysts who can learn a new tool in order to get a complex question answered, or are they managers who can spend just a day or two in training, or are they end users who need to see the data presented in a tool they're already comfortable with?)

Once someone builds a question and an answer from the data, how will it be reused and collaborated with? Will it be run once and then never again, or will it need to be repeatedly shared as static data, or will it need to be refreshed from source every X hours? Will people need to change the question and the answer, and have their changes reflected immediately for others?

That'll help you get started with picking the right tool - like storing it in a permanent relational database and building a drag-and-drop GUI atop it, or just pulling the data straight from source and analyzing it with a different tool every time.

Alex Zhu said...

wre are looking for the tye of wire belt agent, if you interesting in it, please contact with us.