Back to description
SQL Server Integration Services (SSIS) is the anchor in a trilogy of products that make up the Microsoft SQL Server Business... more
SQL Server Integration Services (SSIS) is the anchor in a trilogy of products that make up the Microsoft SQL Server Business Intelligence (BI) platform. SSIS along with Analysis Services and Reporting Services round out a platform that clearly puts Microsoft on the map in the enterprise Business Intelligence arena. In its simplest form, SSIS is an enterprise-level extract, transform, and load (ETL) development tool. However, SSIS is not just a fancy wrapper around an import wizard. In a drag-and-drop development environment, ETL developers can snap together intricate workflows and out-of-the-box data-cleansing flows that rival custom coding and expensive third-party tools. For your edge cases, the model is easily extensible and custom components can be developed in .NET languages to simply snap into the framework. However, custom coding most likely will not even be necessary. With the latest version of SSIS, novice developers can use the embedded Visual Studio Tools for Applications (VSTA) development environment to custom code workflow tasks and data pipeline transformations in VB or C# .NET languages.
When we put together the first edition of this book, we were blown away by the new architecture and capabilities of SSIS. SSIS was a big change from the DTS product that it replaced and there has been much to learn. Since the first edition of SSIS, we have collectively racked up many years of experience converting older DTS packages and mindsets over to using SSIS, and trust us when we say that no one who’s made the change is asking to go back. We’ve learned some things, too. If you run into an issue getting up and running, converting older packages, or creating new ones, we’ve probably run into that issue too and have a solution for you here in this book. This book is a new edition and a whole new book. Nothing was sacred in this rewrite because we really dug in to put the last few years of experience working with this product back into these pages. We think the result is worth it and this edition will make your experience with SSIS a more productive one. This chapter starts from the beginning and provides an overview of SSIS, describes where it fits within the BI product platform, and ETL development in general.
... less
As with any Microsoft product, SQL Server ships with a myriad of wizards and tools to make your life easier and reduce your... more
As with any Microsoft product, SQL Server ships with a myriad of wizards and tools to make your life easier and reduce your time to market. In this chapter you learn about some of the tools that are available to you and how to create your first basic package. These wizards make transporting data and deploying your packages much easier and can save you hours of work in the long run. We start the discussion with the Import and Export Wizard, which allows you to create a package for importing or exporting data quickly. As a matter of fact, you may run this tool in your day-to-day work without even knowing that SSIS is the back-end for the wizard. The latter part of this chapter explores other tools that are available to you, such as Business Intelligence Development Studio.
SSIS tasks are the foundation for the control flow in SSIS. When you are on the Control Flow design surface in BIDS, the... more
SSIS tasks are the foundation for the control flow in SSIS. When you are on the Control Flow design surface in BIDS, the toolbar is populated with a set of Tasks components that can be snapped together to represent a logical or control workflow for your package. What you might not know is that tasks may also be used to define control flows in response to an event raised somewhere in the package. In either case, using Task components to map out the logical sequence of actions for a package is probably the most similar aspect that SSIS has to the legacy DTS product.
A task is a discrete unit of work that can perform typical actions required by an ETL process from moving a file and preparing data sources to sending email confirmations when everything is complete. This is most evident in the fact that the data flow is tied to the controller flow with a specific Data Flow task. More advanced tasks enable you to perform actions like executing SQL commands, sending mail, running ActiveX scripts, and accessing Web services. If you look at the toolbar, you’ll see there is a large list of tasks that you can use out-of-the-box for ETL package development and a few that are more enterprise application integration (EAI) related. Most of the tasks are covered in this chapter, however some in less detail since they are covered in other chapters. The exception will be the Looping and Sequence Containers, which are covered separately in Chapter 4. This chapter introduces you to most of the tasks you’ll be using on a frequent basis and gives you some examples of how to use them. This will all be reinforced as you read through the rest of the book, because each of these tasks will be used in at least one further example in later chapters.
In the last chapter, you read about tasks and how they interact in the Control Flow. There was one critical piece that was... more
In the last chapter, you read about tasks and how they interact in the Control Flow. There was one critical piece that was left out of that discussion. Containers are objects that help SSIS provide structure to one or more tasks. They can help you loop through a set of tasks until a criterion has been met or can help you group a set of tasks logically. Containers can also be nested, containing other containers. Containers are set in the Control Flow tab in the Package Designer. There are four types of containers in the Control Flow tab: Task Host, Sequence, For Loop, and Foreach Loop Containers.
In the last two chapters you were introduced to the Control Flow tab through tasks and containers. In this chapter, you’ll... more
In the last two chapters you were introduced to the Control Flow tab through tasks and containers. In this chapter, you’ll continue along those lines with an exploration of the Data Flow tab, which is where you’ll spend most of your time as an SSIS developer. The Data Flow Task is where the bulk of your data heavy lifting will occur in SSIS. This chapter walks you through how each transformation in the Data Flow Task can help you move and clean your data.
If you have used SSIS or DTS packages for any involved ETL process, you have inevitably encountered the need to have dynamic... more
If you have used SSIS or DTS packages for any involved ETL process, you have inevitably encountered the need to have dynamic capabilities. A dynamic package can reconfigure itself at runtime to do things like run certain steps conditionally, to create a series of auto-generated filenames for export, or to retrieve and set send-to addresses on an alert email from a data table. The paradigm shifted radically on how to do this as SQL Server evolved from DTS to the current SSIS packages and frankly, expressions was one of the features that was under-represented in the Books On-Line. As a result, the concept of expressions was a topic of many inquires as developers and architects began rolling out SSIS projects in their development shops.
This chapter is our attempt to remedy the confusion and get you up to speed on expressions. Here we will consolidate the common questions, answers, and best practices about expressions that we’ve been hearing about and explaining since the first release of SSIS. The good news is that expressions are easy to use and impressively powerful. The even better news is that Microsoft has now supplemented the Books On-Line with a hefty section on expressions. As you read this chapter you will gain an understanding not only about how expressions work, but you’ll also gain some insight to how you can use expressions now on your current SSIS project.
In the simplest ETL scenarios, you use an SSIS pipeline to extract data from a single source table and populate the corresponding... more
In the simplest ETL scenarios, you use an SSIS pipeline to extract data from a single source table and populate the corresponding destination table. In practice, though, you usually won’t see such trivial scenarios: the more common ETL scenarios will require you to access two or more data sources simultaneously and merge their results together into a single destination structure. For instance, you may have a normalized source system that uses three or more tables to represent the product catalog, whereas the destination represents the same information using a single de-normalized table (perhaps as part of a data warehouse schema). In this case you would need to join the multiple source tables together in order to present a unified structure to the destination table.
In the relational world, such requirements are easily met by employing a relational join operation. However, in the ETL world you may not be so fortunate that the tables to be joined live in the same physical database, same brand of database, same server, or in the worst cases even same physical location (all of which typically render the relational join method useless). In fact, one common scenario is where data from a legacy source system is staged in flat text files, which then need to be joined to dimensional data residing in a SQL Server data warehouse.
So the ETL system needs to be able to join data in a similar way to relational systems, but should not be constrained to having the source data live in the same physical database. SQL Server Integration Services provides several methods for performing such joins, ranging from support in native components through to custom methods implemented in TSQL or managed code.
This chapter explores the various options for performing joins, and contrasts when and which method you should use for various circumstances. After reading this chapter you should gain insight into how to optimize the various join operations in your ETL solution, and understand the design, performance, and resource tradeoffs therein.
Now that you’ve learned about all the basic tasks and transforms in SSIS, you can jump into some practical applications for... more
Now that you’ve learned about all the basic tasks and transforms in SSIS, you can jump into some practical applications for SSIS. You’ll first start with a normal transformation of data from a series of flat files into SQL Server. Next you’ll add some complexity to a process by archiving the files automatically. The last example will show you how to make a package that handles basic errors and makes the package more dynamic. As you run through the tutorials, remember to save your package and to a lesser degree your project on a regular basis often to avoid any loss of work.
With the introduction of C#, and the embedding of the new Visual Studio Tools for Applications into SSIS, you can’t think... more
With the introduction of C#, and the embedding of the new Visual Studio Tools for Applications into SSIS, you can’t think of using the Script Task and Script Component as scripting anymore; now it’s all-out programming. In the early days of DTS-based SQL Server ETL processing, the ActiveX Script Task allowed you to embed programmatic logic and became the Swiss-Army knife of package development. Typically, you’d code logic into these Script Tasks to control the execution and logic flow within a package or to perform some specialized business validation.
Scripting in SSIS has completely evolved from these simple ActiveX roots. You’ve still got a Swiss-Army knife hidden in here, but there is a separation of functionality from previous uses of ActiveX scripting into three new concepts: the Scripting Task, the Scripting Component, and Expressions. Expressions are completely new to SSIS and replace the old methodology of manipulating variables or properties within the package model. The other two scripting components provide access into a new scripting development environment using Microsoft Visual Studio Tools for Applications (VSTA). This change finally allows SSIS developers to script logic into packages using Visual Basic 2008 or Visual C# 2008 .NET code.
In this chapter, you learn all about these new scripting options and learn how to exploit them in your package development tasks to control execution flow, perform custom transformations, manage variables, and provide runtime feedback.
Among the various applications of SQL Server Integration Services (SSIS), the most common is loading a data warehouse. SSIS... more
Among the various applications of SQL Server Integration Services (SSIS), the most common is loading a data warehouse. SSIS provides the ETL features and functionality to efficiently handle many of the tasks required when dealing with transactional source data that will be extracted and loaded into a data mart, data warehouse, or even an operational data store (ODS), including the capabilities to then process data from the relational data warehouse into the SQL Server Analysis Services (SSAS) cubes.
Soup to nuts, SSIS provides the core foundation of data processing from your source, to staging, to your data mart, and onto your cubes (and beyond!). Figure 10-1 highlights a common architecture of a business intelligence (BI) solution.
The presentation layer on the right side of Figure 10-1 shows the main purpose of the BI solution, which is to provide business users (from the top to the bottom of an organization) meaningful data that they can take actionable steps from. Underlying the presentation data are the back-end structures and processes that make it possible for data to become meaningful and visible to the right people.
ETL is a large part of this back-end process because its responsibility is to move and restructure the data between the data tiers of the BI architecture. This involves many steps, as you will see, from data profiling, to data extraction, dimension table loading, fact table processing, and SSAS processing. This chapter will set you on course to architecting and designing an ETL process for data warehouse and business intelligence ETL.
In fact, SSIS contains several tasks and transformations out-of-the-box to get you well on your way to a stable and straightforward ETL process. Some of these components include the Data Profiling Task, the Slowly Changing Dimension Transformation, and the Analysis Services Execute DDL Task. The tutorials in this chapter all coordinate together to demonstrate the processing required for the Sales Quota Fact table and SSAS measure group, which includes the ETL required for the Employee dimension.
There’s an old adage that says when you’re holding a hammer, everything else looks like a nail. When you use SSIS to build... more
There’s an old adage that says when you’re holding a hammer, everything else looks like a nail. When you use SSIS to build a solution, make sure that you are using the right tool for every problem you tackle. SSIS will be excellent for some jobs, and SQL Server will shine at other tasks. When used in concert, the combination of the two can be powerful.
This chapter discusses other features in the SQL Server arsenal that can help you build robust and high-performance ETL solutions. The SQL Server relational engine has many features that were designed with data loading in mind, and as such the engine and SSIS form a perfect marriage to extract, load, and transform your data. In SQL Server 2008 some of the new relational features were built in direct consultation with the SSIS team.
This chapter assumes you are using SQL Server 2008 as the source system, though many of the same principles will apply to earlier versions of SQL Server and to other relational database systems too. You should also have the SQL Server 2008 versions of AdventureWorks and AdventureWorksDW installed; these are available from www.codeplex.com.
www.codeplex.com
The easiest way to look at how the relational engine can help you design ETL solutions is to segment the topic into the three basic stages of ETL: extraction, transformation, and loading. Because the domain of transformation is mostly within SSIS itself, there is not much to say there about the relational engine, so the scope of interest will be narrowed down to extraction and loading.
In Chapter 11 you discovered how to incrementally extract data from sources with SSIS. In this chapter, you learn about importing... more
In Chapter 11 you discovered how to incrementally extract data from sources with SSIS. In this chapter, you learn about importing and working with data from heterogeneous, or non–SQL Server, sources. In today’s enterprise environments, data may exist in many diverse systems, such as mainframes, Oracle, DB2, Office documents, XML, or flat files, to name just a few. The data may be generated within the company, or it may be delivered through the Internet from a trading partner. Whether you need to import data from a spreadsheet to initially populate a table in a new database application, pull data from other sources for your data warehouse, or rely on a Web service to grab up-to-the-minute information, accessing heterogeneous data is probably a big part of your job.
You can load data into SQL Server using SSIS from any ODBC-compliant or OLE DB-compliant source. Many ODBC drivers and OLE DB providers are supplied by Microsoft for sources like Excel, Access, DB2, FoxPro, Sybase, Oracle, and dBase. Others are available from database vendors. A variety of data source components are found in SSIS. These include Excel, Flat File, XML, ADO.NET (which is used to connect to .NET sources), OLE DB (which allows connections to many different types of data), and Raw File (which is a special source used to read data that has been previously exported to a Raw File destination). If the supplied data sources do not meet your needs, you can also create custom data sources.
This chapter walks you through accessing data from several of the most common sources. Each one is relatively easy to work with, but each is configured a bit differently:
Excel and MS Access (versions 2007 and earlier): Excel is often used as a quick way to store data because spreadsheets are easy to set up and use. Access applications are frequently upsized to SQL Server as the size of the database and number of users increase.
Oracle: Even companies running their businesses on Oracle or another of SQL Server’s competitors sometimes make use of SQL Server for its cost-effective reporting and business intelligence solutions.
XML and Web Services: XML and Web services (which is XML delivered through HTTP) are standards that allow very diverse systems to share data. The XML data source allows you to work with XML as you would almost any other source of data.
Flat Files: Beyond just standard delimited files, SSIS can parse flat files of various types and code page encoding, which allow files to be received from and exported to different operating systems and non-Windows based systems, which reduce the need to convert flat files before or after working with them in SSIS.
ODBC: Many organizations maintain older systems that use legacy ODBC providers for data access. Because of the complexities and cost of migrating systems to newer versions, ODBC is still a common source.
Other Heterogeneous Sources: The sources listed so far are the most common; however, this only touches upon the extent of data sources that SSIS can access. The last section of this chapter provides you resources and generalities when you are trying to access other sources such as Teradata, SAP, DB2, or Sybase.
Reliability and scalability are goals for all your systems, yet they may seem like a strange combination for a chapter. Often... more
Reliability and scalability are goals for all your systems, yet they may seem like a strange combination for a chapter. Often, though, there are direct links, as you will see. Errors and the unexpected conditions that precipitate them are the most obvious threats to a reliable process. There are several features of SQL Server 2008 Integration Services that allow you to handle these situations with grace and integrity, keeping the data moving and systems running. Error outputs and checkpoints are the two features you will focus on in this chapter, and they highlight to you how these can be used in the context of reliability. The implementation of these methods can also have a direct effect on package performance, and therefore scalability, and you will learn how to take into account these considerations for your package and process design. The ability to provide checkpoints does not natively extend inside the Data Flow, but there are methods you can apply to achieve this. The methods can then be transferred almost directly into the context of scalability, allowing you to partition packages and improve both reliability and scalability at the same time. All of these methods can be combined, and while there is no perfect answer, you will look at the options and acquire the necessary information to make informed choices for your own SSIS implementations.
This chapter dives under the hood of SSIS to consider the architecture of the engine and its components, and then best practices... more
This chapter dives under the hood of SSIS to consider the architecture of the engine and its components, and then best practices for design and optimization including the following concepts:
Control flow and data flow comparison
Data flow transformation types
Data flow buffer architecture and execution trees
Monitoring data flow execution
Data flow design practices
Tuning the data flow engine
Performance monitoring
The initial part of this chapter is more abstract and theoretical, but we’ll then move into the practical and tangible. In the concluding sections, you will take the knowledge you have developed here and bring it to application, considering a methodology to optimization and looking at a few real-world scenarios.
Software Development Life Cycles play an important role in any type of application development. Many SQL Server database... more
Software Development Life Cycles play an important role in any type of application development. Many SQL Server database administrators and ETL developers have little experience with Microsoft Source Control tools because the tools themselves have been less than “database project-friendly.” Microsoft has responded with a more reliable version of Visual SourceSafe and a new source control architecture called Team System.
In addition, many SQL Server DBAs have not been involved with Software Development Life Cycles (SDLCs) beyond executing scripts attached to change control documentation. Legislation around the world has changed the role of the SQL Server DBA in the enterprise because of new requirements for tracking changes. Regarding Software Development Life Cycles, DBAs now must participate in ever-earlier phases of the project development.
In addition, SQL Server DBAsespecially SSIS developerswill realize greater productivity and development cycle fault tolerance as they employ source-controlled development practices. These practices produce code that is auditable, an added benefit in the current corporate climate.
This chapter provides an overview of some of the available features in Microsoft’s new offerings. It includes a brief description of how to store a project in Visual SourceSafe and a detailed walkthrough that describes creating a Team Projectusing Visual Studio Team Systemfor SSIS. In practice, Team Projects will most likely be created by someone else in the software development enterprise.
Because the line between database administrator and software developer has blurred and blended over the years, the Team Project walkthrough is built in Visual Studio 2008. In the Team Project walkthrough, you are going to put together a project that uses the source control and collaboration functionality provided by Visual Studio Team System to demonstrate working with the tool and complying with your SDLC process.
This chapter also contains information about debugging and breakpointshighlighting features new to database administrators and ETL developers in SSIS.
Included is a discussion regarding development and testing with an emphasis on agile development methodology, which is very well suited for SSIS development because of the methodology’s ability to adapt to changesa common occurrence in ETL development.
The chapter concludes with a discussion about managing package deployment.
By now, you are probably pretty familiar with various basic aspects of SSIS. In earlier chapters, you’ve studied the new... more
By now, you are probably pretty familiar with various basic aspects of SSIS. In earlier chapters, you’ve studied the new SSIS interface, the new object model, internal design, and how to write SSIS packages.
In SQL Server 2008, you can easily run SQL Server 2000 DTS packages in the 2008 environment. In a future release of SQL Server however, DTS will be officially deprecated and not supported. In this chapter, you look at how to migrate DTS 2000 packages to SSIS and, if necessary, how to run DTS 2000 packages under SSIS.
SQL Server Integration Services provides some valuable features to enable you to control the workflow of your SSIS packages... more
SQL Server Integration Services provides some valuable features to enable you to control the workflow of your SSIS packages at a very granular level. Functionality that you might expect to be available only by scripting can often be accomplished by setting a few properties of a component. In addition, SSIS comes with powerful error-handling capabilities, the ability to log detailed information as the package runs, and debugging functionality that speeds up troubleshooting and design.
This chapter walks you through controlling the package workflow, beginning at the highest level using precedence constraints and then drilling down to event handling. You’ll see how trappable events play a role in breakpoints, and how to perform exception handling for bad data in the data flow. Finally, you learn how these features can be used for troubleshooting, debugging, and enabling you to build robust SSIS packages.
Once you start implementing a real-world integration solution, you may have requirements that the built-in functionality... more
Once you start implementing a real-world integration solution, you may have requirements that the built-in functionality in SSIS does not meet. For instance, you may have a legacy system that has a proprietary export file format, and you need to import that data into your warehouse. You have a robust SSIS infrastructure that you have put in place that allows you to efficiently develop and manage complex ETL solutions, but how do you meld that base infrastructure with the need for customization? That’s where custom component development comes into play. Out of the box, Microsoft provides a huge list of components for you in SSIS; however, you can augment those base components with your own more specialized tasks.
The benefit here is not only to businesses, but to software vendors too. You may decide to build components and sell them on the web, or maybe start a community-driven effort on a site such as www.codeplex.com. Either way, the benefit you get is that your components will be built in exactly the same way that the ones that ship with SSIS are built; there is no secret sauce (besides expertise) that Microsoft adds to their components to make them behave any differently from your own. The opportunity is that you truly can “build a better mouse trap”if you don’t like the way that one of the built-in components behaves, then you can simply build your own one instead.
Building your first component may be a little challenging, but hopefully with the help of this chapter you will be able to overcome this. In this chapter you focus on the pipelinenot because it is better than any other area of programmability within SSIS, but because it will probably be the area where you have the most benefit to gain; and it does require a slightly greater level of understanding. It also allows you to see some of the really interesting things that Microsoft has done in SSIS. All forms of extensibility are well covered in the SQL Server documentation and samples, so don’t forget to leverage those resources as well.
Now that you’ve learned how to extend the pipeline with your own custom components, the next step is to improve the user... more
Now that you’ve learned how to extend the pipeline with your own custom components, the next step is to improve the user experience and efficiency, by adding a user interface. This will be demonstrated using the ReverseString example from the previous chapter.
Pipeline components do not require the developer to provide a user interface, because the components ship with a default interface called the Advanced Editor. Although this saves time and resources, the overall user experience can be poor. It can increase the package development time and requires the user to have an intimate knowledge of the component to be able to correctly set the required columns and properties. It is also dangerous and open to data integrity problems, because the more complex the configuration required, the more acute the lack of suitable prompts and real-time validation becomes, making configuration tedious and error-prone. For complex components with multiple inputs, the Advanced Editor will not be suitable. For simple components, however, the built-in Advanced Editor, as used by several stock components, is perfectly acceptable. If you want to add that extra style and guidance for the end user, though, this chapter is for you.
You will learn how to add a user interface to a component and look in detail at each of the stages. You will then be able to apply these techniques to your own components. It is worth noting that this chapter deals exclusively with managed components.
Throughout this book, you’ve been exposed to different ways to manage the development and administration of SSIS packages... more
Throughout this book, you’ve been exposed to different ways to manage the development and administration of SSIS packages using the Visual Studio IDE and the SQL Server Management Studio. This chapter expands on those operations by providing an overview of the ways in which you perform these same management and administration functions programmatically through managed code. You learn how to perform package management operations using the managed Application and Package classes exposed in the dynamic-linked library Microsoft.SQLServer.ManagedDTS.dll by the .NET Microsoft.SqlServer.Dts.Runtime namespace.
Application
Package
Microsoft.SQLServer.ManagedDTS.dll
Microsoft.SqlServer.Dts.Runtime
The second half of this chapter details the capabilities of the WMI Data Reader Task and the WMI Event Watcher Task. These tasks provide access to system information via the Windows Management Interface model, better known as WMI. Through a query-based language called WQL, similar to SQL in structure and syntax, you can obtain information about a wide variety of system resources to assist you in your SQL Server administrative responsibilities. With WMI, you can mine system-based metrics to look for hardware and operating system trends. In SSIS, using WMI you can also work more proactively to monitor a Windows-based system for notable events that occur in the system and even trigger responsive actions.
SQL Server 2008 Integration Services accepts data from nearly any source and presents output, including ADO.NET datasets... more
SQL Server 2008 Integration Services accepts data from nearly any source and presents output, including ADO.NET datasets and SSIS data readers that are consumable by external applications. These features allow SSIS to sink and source external applications with ease. In this chapter, you take a look at three examples of external applications that utilize SSIS. This chapter is not intended to exhaust all possible combinations of external interface with SSIS, but rather to provide a sampling of some available functionality.
SSIS is flexible and configurable, so there are many ways to approach interaction with external applications. This book is rife with examples, including the following:
Sources and DestinationsImplicit objects inside SSIS that provide connectivity to data sources and destinations. New in 2008 are two new sources called the ADO.NET Source and the Performance Counters Source. The ADO.NET Source uses the .NET provider to access the data being sourced. The Performance Counters Source extracts performance measures from the operating system. There are also two new destinations added in 2008, one being the ADO.NET Destination, which loads data using the .NET provider, and the SQL Server Compact Edition Destination, which loads data into, well, the Compact Edition of SQL Server.
ScriptingArguably provides the most flexibility when interacting with external applications. Similar to Integration Services 2005, the Script Component still comes in three flavors: Source, Destination, or Transformation. The exciting enhancement in this version is the option to use C# as well as VB.NET. The .NET Framework version for developing in this component is 3.5. We know plenty of developers will cheer about this enhancement! See Chapter 9 for an example and more information.
Because interface scenarios can vary, it is difficult to define best practices. That said, generally accepted software development practices apply, including the following:
Employ a methodology: Chapter 15 provides an introduction to Software Development Life Cycles (SDLCs). A development methodology is not a prescribed recipe; it is a framework that assists you in creating the proper recipe for your software development project.
Debug: Execute your SSIS package in debug mode in either the Business Intelligence Developer Studio or the Visual Studio Integrated Development Environment (IDE). You may also find that using breakpoints in conjunction with the watch window helps greatly during your development cycle and is a highly recommended best practice. Another great tip is to group similar tasks in containers (preferably a Sequence Container) and execute just the container you choose to test. This technique will allow you to do unit-type testing without having to run the entire package to avoid the “all or nothing” approach to package development.
Test: Whenever possible, obtain a sample of actual (“live”) data and execute your package against this data. This will ensure that the business logic implemented in your packages are fine-tuned and will validate the correctness of your package logic. In the absence of access to a copy of live data, populate tables with dummy data based on the business rules embedded in the tables of the production data and execute your package against them. When populating your tables with dummy data it is very important that you fully understand the semantics of the production data and be sure to mirror those as accurately as possible. This will prevent you from developing logic that may work on test data, but raises problems during production.
In the first example, SSIS will read data from a Microsoft Office InfoPath document and perform a sorting transform on the data and write the data out to a flat file for the destination. The key concept behind this exercise is to demonstrate how Integration Services can easily source and transform data from an XML document.
So you have a set of packages and are ready to run the package in production. This chapter focuses on how to administer packages... more
So you have a set of packages and are ready to run the package in production. This chapter focuses on how to administer packages after you’ve deployed them to production. Specifically, we cover how to configure, deploy, and then administer the SSIS service. We also cover how to create a stand-alone ETL server and some of the command-line utilities you can use to make your job easier. After this chapter, you’ll be able to create a package that will not require any effort to migrate from development to production after the first deployment.
Typically a book like this has to cover so much material that there is not enough space to really dig into some of the typical... more
Typically a book like this has to cover so much material that there is not enough space to really dig into some of the typical issues that you run into when you put the book down and start putting together your first solution. You end up coming back to the book to flip through all of the one-off examples, but they just don’t seem to provide any insight or applicability to your current project or deadline. The case study is your best chance to get specific, to get into the ring, to take a business issue and run with it. Hopefully you’ll be the beneficiary of this.
You will use the SSIS environment to solve a payment processing problem with payment data of varying levels of quality that has to be validated against corporate billing records. This example is a little different from the typical data-warehouse-type ETL case study; it’s a little more programmatic. Not to say that there is not any ETL. You’ll need to import three heterogeneous data formats, but the interesting part is the use of the SSIS Data Flow Transforms that allow for the development of smart validation and matching programming logic. This will all combine into a solid learning opportunity that showcases the real capabilities of SSIS.
Purchase Before purchasing this product, please be sure you have met all software and system requirements, and that you understand any limits placed upon its use.
Return Policy Wrox Chapters on Demand are non-returnable and non-refundable.
Reader Software Wrox Chapters on Demand are offered as PDFs, and they must be viewed using the Adobe Reader. If you do not have the Reader installed, it can be downloaded for free at Adobe.com.
Test Download As Wrox Chapters on Demand purchases are non-returnable, it is advisable that you test your system and software configurations with a free sample download before you place an order.
Usage Rights for a Wrox Chapter on Demand File Any Wrox Chapter on Demand product you purchase from this site will come with certain restrictions that allow Wiley to protect the copyrights of its products. After you purchase and download this title, you:
If you have any questions about these restrictions, you may contact Customer Care at (877) 762-2974 (8 a.m. - 5 p.m. EST, Monday - Friday). If you have any issues related to Technical Support, please contact us at 800-762-2974 (United States only) or 317-572-3994 (International) 8 a.m. - 8 p.m. EST, Monday - Friday).