Wednesday, June 5, 2019

A Guide Into Business Intelligence Studies Information Technology Essay

A Guide Into Business Intelligence Studies In spirtation Technology Essay information reposition Integration of entropy from multiple sources into large storage wargonhouses and support of on-line analytical processing and business decision makingDW vs. in operation(p) DatabasesData W atomic round 18house caseful OrientedIntegratedNonvolatileTime variantAd hoc retrievalOperational DatabasesApplication orientedLimited integrationContinuously updatedCurrent selective information set onlyPredictable retrievalData Warehouse a subject-oriented, integrated, time-variant, and nonvolatile collection of entropy in support of managements decision-making process.Data MartA monothematic data storeDepartment- oriented or business line orientedTop-Down ApproachAdvantagesA truly corporate effort, an enterprise view of dataInherently architected not a union of disparate data martsSingle, central storage of data active the contentCentralized rules and controlMay inflict quick results if i mplemented with iterationsDisadvantagesTakes longer to build even with an iterative method graduate(prenominal) exposure/risk to failureNeeds high level of cross-functional skillsHigh outlay without proof of conceptBottom-Up ApproachAdvantagesFaster and easier implementation of manageable piecesFavorable return on investment and proof of concept slight risk of failureInherently incremental can schedule important data marts firstAllows project team to learn and growDisadvantagesEach data mart has its own narrow view of dataPermeates redundant data in every data martPerpetuates inconsistent and irreconcilable dataProliferates gawky interfacesData Staging ComponentThree major functions need to be performed for getting the data ready (ETL)extract the datatransform the dataand indeed load the data into the data warehouse storageData WarehouseSubject-Oriented Data is stored by subjectsIntegrated Data Need to pull together any the relevant data from the various governancesData from i nternal operational systemsData from outside sourcesTime-Variant Data the stored data contains the current valuesThe use needs data not only about the current purchase, but on the past purchasesNonvolatile Data Data from the operational systems are moved into the data warehouse at specific intervalsData Granularity Data granularity in a data warehouse refers to the level of detailThe lower the level of detail, the fine the data granularityThe lowest level of detail a lot of data in the data warehouseFour steps in dimensional modelingIdentify the process being modeled.Determine the grain at which occurrences will be stored.Choose the dimensions.Identify the numeric measures for the facts.Components of a head teacher outlineFact tables contain factual or quantitative data1N relationship between dimension tables and fact tablesDimension tables contain descriptions about the subjects of the businessDimension tables are denormalized to maximize performanceSlowly changing dimensio nsAre the Customer and Product Dim independent of Time Dim?Changes in names, family status, product district/regionHow to handle these changes in order not to affect the history status? Eg. Insurance3 suggestions for slowly changing dimensions emblem 1 overwrite/erase old values no accurate principaling of history needed easy to implementType 2 force new record at time of change partitioning the history (old and new description)Type 3 new current fields, legitimate need to track both old and new states Original and current values Intermediate Values are lostJunk DimensionsLeave the flags in the fact tables promising sparse datano real browse entry capabilitycan significantly increase the size of the fact tableRemove the attributes from the purportpotentially critical information will be lostif they provide no relevance, remove themMake a flag into its own dimensionwhitethorn greatly increase the number of dimensions, increasing the size of the fact tablecan clutter and confuse the designCombine all relevant flags, and so on into a private dimensionthe number of possibilities remain finiteinformation is retainedThe Monster DimensionIt is a compromiseAvoids creating copies of dimension records in a significantly large dimensionDone to manage space and changes efficiently3 types of multidimensional dataData from external sources (represented by the blue cylinder) is copied into the abject red marble cube, which represents input multidimensional dataPre-calculated, stored results derived from iton-the-fly results, calculated as required at run-time, but not stored in a databaseAggregationThe system uses physically stored aggregates as a way to enhance performance of common queries.These aggregates, like indexes, are chosen silently by the database if they are physically present. abate drug users and application developers do not need to know what aggregates are available at any point in time, and applications are not required to explicitly code the name of an aggregateWhen you go for higher level of aggregates, the sparsity percentage goes down, eventually r for each oneing 100% of occupancyData Extraction 2 major types of data extractions from the source operational systemsas is (static) data and data of revisionas is or static data is the capture of data at a given point in timeFor initial loadData of revision is known as incremental data captureData Quality IssuesDummy values in fieldsMissing dataUnofficial use of fieldsCryptic valuesContradicting valuesReused primary keys unreconciled valuesIncorrect valuesMultipurpose fieldsSteps in Data CleansingParsingCorrectingStandardizingMatchingConsolidatingDATA TRANSFORMATIONAll the extracted data must be made functional in the data warehouseThe quality of the data in many old legacy systems is less likely to be good enough for the data warehouseTransformation of source data encompasses a wide variety of manipulations to change all the extracted source data into usable information to b e stored in the data warehouseData warehouse practitioners have attempted to classify data transformations in some(prenominal) ways rudimentary TasksSet of basic tasksSelectionSplitting/JoiningConversionSummarizationEnrichmentLoadingInitial LoadLoad modeIncremental Loads shaping merge modeType 1 slowly changing dimension destructive merge modeFull RefreshLoad and append modes are applicableOLAP definedOn-line Analytical Processing(OLAP) is a category of software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access in a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the userUsers need the ability to perform multidimensional analysis with complex calculationsThe basic virtues of OLAPEnables analysts, executives, and managers to gain useful insights from the presentation of dataCan reorganize metrics o n several dimensions and allow data to be viewed from different perspectivesSupports multidimensional analysisIs able to drill down or roll up within each dimensionBUSINESS METADATAIs like a roadmap or an easy-to-use information directory showing the contents and how to get itHow can I sign onto and connect with the data warehouse?Which parts of the data warehouse can I access?Can I see all the attributes from a specific table?What are the definitions of the attributes I need in my query?Are there any queries and reports already predefined to give the results I need?TECHNICAL METADATATechnical metadata is meant for the IT staff responsible for the discipline and administration of the data warehouseTechnical metadata is like a support guide for the IT professionals to build, maintain, and administer the data warehousePhysical Design ObjectivesImprove actionIn OLTP, 1-2 secs max in DW secs to minsEnsure scalabilityManage storageProvide Ease of AdministrationDesign for Flexibility.Ph ysical Design StepsDevelop Standards form Aggregates PlanDetermine Data PartitioningEstablish Clustering OptionsPrepare Indexing StrategyAssign storage structuresPartitioningBreaking data into several physical units that can be handled separatelyNot a question of whether to do it in data warehouses but how to do itGranularity and partitioning are key to effective implementation of a warehousePartitions are spread across multiple disks to boost performanceWhy Partition?Flexibility in managing dataSmaller physical units alloweasy restructuringfree indexingsequential scans if neededeasy reorganizationeasy recoveryeasy superviseImprove performanceCriterion for PartitioningVertically (groups of selected columns together. More typical in dimension tables)Horizontally (e.g. recent events and past history. Typical in fact tables)ParallelizationThe argument goesif your main problem is that your queries run too slowly, use to a greater extent than one machine at a time to make them run quic k (Parallel Processing).Oracle uses this strategy in its warehousing products.IndexingStructure separate from the table data it refers to, storing the location of rows in the database based on the column values specified when the index is created.They are used in data warehouse to improve warehouse throughputIndexing and loadingIndexing for large tablesBtree characteristicsBalanced shaggy multi-way treeBlock-orientedDynamicBitmap IndexBitmap indices are a special type of index designed for efficient querying on multiple keysRecords in a relation are assumed to be numbered sequentially from, say, 0Given a number n it must be easy to bump record nParticularly easy if records are of fixed sizeApplicable on attributes that take on a relatively small number of distinct valuesE.g. gender, country, state, E.g. income-level (income broken up into a small number of levels such as 0-9999, 10000-19999, 20000-50000, 50000- infinity)A bitmap is simply an array of bitsIn its simplest form a bit map index on an attribute has a bitmap for each value of the attributeBitmap has as many bits as recordsIn a bitmap for value v, the bit for a record is 1 if the record has the value v for the attribute, and is 0 otherwiseClusteringThe technique involves placing and managing link units of data to be retrieved in the same physical block of storageThis arrangement causes related units of data to be retrieved together in one single operationIn a clustering index, the order of the rows is close to the index order. Close means that physical records containing rows will not have to be accessed more than one time if the index is accessed sequentiallyDW DeploymentMajor deployment activities discern user acceptancePerform initial loadsGet user desktops readyComplete initial user trainingInstitute initial user supportDeploy in stagesDW Growth MaintenanceMonitoring the DWCollection of StatsUsage of StatsFor exploitation planningFor fine tuningUser trainingData ContentApplications ToolsDime nsional Modeling ExerciseExercise Create a star schema diagram that will enable FIT-WORLD GYM INC. to analyze their revenue. The fact table will include for every instance of revenue taken attribute(s) useful for analyzing revenue. The star schema will include all dimensions that can be useful for analyzing revenue. The only data sources available are shown bellow.SOURCE 1FIT-WORLD GYM Operational Database ER-Diagram and the tables based on it (with data)SOLUTION

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.