You are here

Data Stage

DataStage Interview Questions and Answers,Solution and Explanation - Part 14

What are the Environmental variables in Datastage?

How do you check for Job Errors in Datastage

What are Stage Variables, Derivations and Constants?

What is Pipeline Parallelism?

Which version of Datastage did you use and what are the features in it?

Why do we use a transformer?

What are different schemas you have used in your project ?

What was the intention of choosing one schema over another ?

How do you read a sequential file parallely ?

What are the different types of errors you have encountered and how did you troubleshoot them or what work around you found for them ?

What are different types of containers ?

In a project where did you use the containers?

Tell me the whole process of updating the process control tables ?

How do you implement SCD type I ,II, III in Datastage? Write the diagram of jobs?

How do you invoke a query in Datastage jobs ?

What are environmental variables? How do you declare them ?
How can you invoke a bteq script in Datastage?

What are the different ways to control the execution of Datastage jobs

What is the difference between Merge Stage and Lookup Stage?

What is the difference between the Dynamic RDBMS Stage & Static RDBMS Stage ?

How do you maintain version of Datastage jobs in a project?

How do you give access to multiple users for a job?

How do you remove warnings in Datastage jobs ?

How do you remove duplicates ?

What is the difference between Datastage and Informatica ?

What are server jobs ?

What are containers ?

DataStage Interview Questions and Answers,Solution and Explanation - Part 13

What is the difference between IBM Web Sphere Datastage 7.5 (Enterprise Edition ) & Standard Ascential?

How do you remove duplicates in dataset

What is the difference between Job Control and Job Sequence

What is the max size of Data set stage?

How do you measure performance in sort stage

How to develop the SCD using LOOKUP stage?

What are the errors you experienced with data stage

What are the main diff between server job and parallel job in Datastage

Why you need Modify Stage?

How to run a job using command line?

Is it possible to query a hash file?

What does # indicate in environment variables?

What is user activity in Datastage?

What is the alternative way where we can do job control?

What is the use of job control?

What is APT_CONFIG in Datastage

Explain the best approach to do a SCD type2 mapping in parallel job?

How can we improve the performance of the job while handling huge amount of data

How can we create read only jobs in Datastage .

How to implement routines in Datastage,have any one has any material for Datastage

How will you determine the sequence of jobs to load into data warehouse?

How can we Test jobs in Datastage?

How can we implement Slowly Changing Dimensions in Datastage?.

Differentiate Database data and Data warehouse data?

How to run a Shell Script within the scope of a Datastage job?

What is the difference between Datastage and Informatica

Explain about job control language such as (DS_JOBS)

What is Invocation ID?

How to connect two stages which do not have any common columns between them?

In SAP/R3, How do you declare and pass parameters in parallel job .

What is the difference between Hashfile and Sequential File?

DataStage Interview Questions and Answers,Solution and Explanation - Part 12

How do we create index in Datasatge?

What is the flow of loading data into fact & dimensional tables?

What is a sequential file that has single input link?

What is hashing algorithm?

What is Orchestrate options in generic stage, what are the option names. value ? Name of an Orchestrate operator to call.

How do you fix the error "OCI has fetched truncated data" in Datastage

A batch is running and it is scheduled to run in 5 minutes. But after 10 days the time changes to 10 minutes. What type of error is this and how to fix it?

Which partition we have to use for Aggregate Stage in parallel jobs ?

What is the baseline to implement parition or parallel execution method in Datastage job.e.g. more than 2 millions records only advised ?

What are the orchestrate operators available in Datastage for AIX environment.

Explain about the Debug stages in PX

What is the difference between Squential Stage & Dataset Stage. When do you use them.

How is memory allocation done while using lookup stage

What is Phantom error in the Datastage . How to overcome this error.

Explain Parameter file usage in Datastage

Are Type 30D hash file is GENERIC or SPECIFIC?

Is Hashed file an Active or Passive Stage? When will be it useful?

How do you extract job parameters from a file?

What about System variables?

How can we create Containers?

How can we improve the performance of Datastage?

What are the Job parameters?

What is the difference between routine and transform and function?

What are all the third party tools used in Datastage?

How can we implement Lookup in Datastage Server jobs?

How can we implement Slowly Changing Dimensions in Datastage?.

How can we join one Oracle source and Sequential file?.

What is iconv and oconv functions?

What is quality stage and profile stage?

What is the use and advantage of procedure in Datastage?

What are the important considerations while using join stage instead of lookups.

How to implement type2 slowly changing dimenstion in Datastage? Explain with an
with example?

How to implement the type 2 Slowly Changing dimension in Datastage?

DataStage Interview Questions and Answers,Solution and Explanation - Part 11

Data Warehouse Interview Questions,Live DataStage Tool Interview Questions.

What are Static Hash files and Dynamic Hash files?

What is the difference between Datastage Server jobs and Datastage Parallel jobs?

What is ' insert for update ' in Datastage

What is the order of execution done internally in the transformer with the stage editor having input links on the lft hand side and output links?

How will you call external function or subroutine from Datastage?

What happens if the job fails at night?

What is DS Administrator used for - did you use it?

How do you do oracle 4 way inner join if there are 4 oracle input files?

How do you pass filename as the parameter for a job?

How do you populate source files?

How to install and configure Datastage EE on Sun Micro systems multi-processor hardware running the Solaris 9 operating system?

What are all the third party tools used in Datastage?

How do you eliminate duplicate rows?

What is the difference between routine and transform and function?

Do you know about INTEGRITY/QUALITY stage?

How to attach a mtr file (MapTrace) via email and the MapTrace is used to record all the execute map errors

If your running 4 ways parallel and you have 10 stages on the canvas, how many processes does Datastage create?

Explain the differences between Oracle8i/9i?

How will you pass the parameter to the job schedule if the job is running at night? What happens if one job fails in the night?

What is an environment variable?

What are the difficulties faced in using Datastage ? or what are the constraints in using Datastage ?

What are XML files and how do you read data from XML files and what stage to be used?

How do you track performance statistics and enhance it?

What are the Types of vies in Datastage Director?

How do you pass the parameter to the job sequence if the job is running at night?

Explain a specific scenario where we would use range partitioning ?

What is job commit in Datastage?

What is the main disadvantages of staging area

How do you configure api_dump

Does type of partitioning change for SMP and MPP systems?

What is the difference between RELEASE THE JOB and KILL THE JOB?

Can you convert a snow flake schema into star schema?

What is repository?

What is Fact loading, how to do it?

What is the alternative way where we can do job control?

Where we can use these Stages Link Partitionar, Link Collector & Inter Process (OCI)

Where can you output data using the Peek Stage?

Do you know about METASTAGE?

In which situation, we are using RUN TIME COLUMN PROPAGATION option?

What is the difference between Datastage and Datastage TX?

Difference between Hashfile and Sequential File?.

What is modulus?

What is iconv and oconv functions?

DataStage Interview Questions and Answers,Solution and Explanation - Part 10

Overview:Data Warehousing Interview Questions,DataStage Interview Questions asked in HCL,Wipro,TCS,Infosys,Tech Mahindra,Patni,Sasken,Birlasoft

What is iconv and oconv functions?

What is project life cycle and how do you implement it?

For what purpose is the Stage Variable is mainly used?

Explain the purpose of using the key and difference between Surrogate keys and natural key

How to read the data from XL FILES?my problem is my data file having some commas in data,but we are using delimitor is| ?how to read the data ,explain with

steps?

How can I schedule the cleaning of the file &PH& by dsjob?

What are Hot Fix for ODBC Stage for AS400 V5R4 in Datastage 7.1

What is Datastage engine?what is its purpose?

What is the difference between Transform and Routine in Datastage?

Why is hash file is faster than sequential file n odbc stage?

How to write and execute routines for PX jobs in c++?

What is a routine?

How to distinguish the surrogate key in different dimentional tables?

How can we generate a surrogate key in server/parallel jobs?

What is NLS in Datastage? how we use NLS in Datastage ?

How did you handle reject data?

What is difference between ETL and ELT?

How to read the data from XL FILES?explain with steps?

What is the meaning of performance tuning techinque,Example?

Differentiate between pipeline and partion parallelism?

What is the use of Hash file?Instead of hash file why can we use sequential file itself?

What is pivot stage?why are you using?what purpose that stage will be used?

How can we create environment variables in Datastage?

What is the difference between static hash files n dynamic hash files?

How can we test the jobs?

What is the difference between reference link and straight link ?

What are the command line functions that import and export the DS jobs?

What is the size of the flat file?

What is the difference between operational Datastage (ODS) & data warehouse?

What are the various process which starts when the Datastage engine starts?

What are the changes need to be done on the database side, If I have to use dB2 stage?

Datastage engine is responsible for compilation or execution or both?

How to use rank updates strategy in Datastage

What is Ad-Hoc access? What is the difference between Managed Query and Ad-Hoc access?

What is Runtime Column Propagation and how to use it?

How we use the Datastage Director and its run-time engine to schedule running the solution, testing and debugging its components, and monitoring the

resulting executable versions on ad hoc or scheduled basis?

What is the difference between OCI stage and ODBC stage?

Is there any difference b/n Ascential Datastage and Datastage .

How do you remove duplicates without using remove duplicate stage?

If a Datastage job aborts after say 1000 records, how to continue the job from 1000th record after fixing the error?

How to remove duplicates in server job

What is the exact difference betwwen Join,Merge and Lookup Stage?

What are the new features of Datastage 7.1 from Datastage 6.1

How to run the job in command prompt in unix?

How to know the no.of records in a sequential file before running a server job?

DataStage Interview Questions and Answers,Solution and Explanation - Part 9

How to drop the index before loading data in target and how to rebuild it in Datastage?

How can ETL excel file to Datamart?

What is the transaction size and array size in OCI stage?how these can be used?

What is job control? How it is developed? Explain with steps?

What is the purpose of exception activity in Datastage 7.5?

Other than Round Robin, What is the algorithm used in link collecter? Also Explain How it will works?

How to implement slowly changing dimensions in Datastage?

What does separation option in static hash-file mean?

How to improve the performance of hash file?

How do you check for the consistency and integrity of model and repository?

How we can call the routine in Datastage job? Explain with steps?

What is job control? how can it used explain with steps?

How to find the number of rows in a sequential file?

If the size of the Hash file exceeds 2GB..What happens? Does it overwrite the current rows?

Where we use link partitioner in Datastage job?explain with example?

Can we use shared container as lookup in Datastage server jobs?

What is the meaning of instace in Datastage?explain with examples?

What is the difference beteen validated ok and compiled in Datastage .

What is auditstage,profilestage,qulaitystages in datastge?

What is PROFILE STAGE , QUALITY STAGE,AUDIT STAGE in Datastage

What are the environment variables in Datastage?give some examples?

What is difference between Merge stage and Join stage?

What is the difference between drs and odbc stage ?

Can you tell me for what puorpse .dsx files are used in the datasatage

How do you clean the Datastage repository.

Give one real time situation where link partitioner stage used?

What is environment variables?what is the use of this?

How do you call procedures in Datastage?

Will the Datastage consider the second constraint in the transformer once the first condition is satisfied

How do you do Usage analysis in Datastage ?

How can you implement slowly changed dimensions in Datastage? explain?

Can you join flat file and database in Datastage?how?

How can you implement Complex Jobs in Datastage

What do we do to remedy?

What is the mean of Try to have the constraints in the 'Selection' criteria of the jobs itself.

What are constraints and derivation?

Explain the process of taking backup in Datastage?

What are the different types of lookups available in Datastage?

How does Datastage handle the user security?

What are the Steps involved in development of a job in Datastage?

What is a project? Specify its various components?

How to implement type2 slowly changing dimensions in Datastage?Explain with example?

What is meaning of file extender in Datastage server jobs.Can we run the Datastage job from one job to another job that file data where it is stored and what

is the file extender in ds jobs.

What is the max capacity of Hash file in Datastage?

DataStage Interview Questions and Answers,Solution and Explanation - Part 8

What is merge and how it can be done Please explain with simple example taking 2 tables?

Is it possible to run parallel jobs in server jobs?

What are the enhancements made in Datastage 7.5 compare with 7.0

If I add a new environment variable in Windows, how can I access it in Datastage?

What is OCI?

Is it possible to move the data from oracle ware house to SAP Warehouse using with Datastage Tool.

How can we create Containers?

What is data set? and what is file set?

How much would be the size of the database in Datastage ?What is the difference between Inprocess and Interprocess ?

Briefly describe the various client components?

What are orabulk and bcp stages?

What is DS Director used for?

How can I extract data from DB2 (on IBM iSeries) to the data warehouse via Datastage
as the ETL tool.

Is it possible to call one job in another job in server jobs?

How can we pass parameters to job by using file.

How can we implement Lookup in Datastage Server jobs?

What user variable activity when it used how it used !where it is used with real example

What is hashing algorithm and explain briefly how it works?

What happens out put of hash file is connected to transformer

What is merge and how to use merge?

What will you in a situation where somebody wants to send you a file and use that file as an input or reference and then run job.

What is the NLS equivalent to NLS oracle code American_America.US7ASCII on Datastage NLS?

Why do you use SQL LOADER or OCI STAGE?

What about System variables?

What are the differences between the Datastage 7.0 and 7.5in server jobs?

How the hash file is doing lookup in serverjobs?How is it comparing the key values?

How to handle the rejected rows in Datastage?

How is Datastage 4.0 functionally different from the enterprise edition now? what are the exact changes?

If data is partitioned in your job on key 1 and then you aggregate on key 2, what issues could arise?

How can I specify a filter command for processing data while defining sequential file output data?

There are three different types of user-created stages available for PX. What are they?

Which would you use? What are the disadvantage for using each type?

What is DS Manager used for - did you use it?

Does Enterprise Edition only add the parallel processing for better performance?Are any stages/transformations available in the enterprise edition only?

What are validations you perform after creating jobs in designer.

What are the different type of errors you faced during loading and how you solve them

DataStage Interview Questions and Answers,Solution and Explanation - Part 6

How can you do incremental load in Datastage? How we use NLS function in Datastage? what are advantages of NLS function? where we can use that one? Explain briefly? What is APT_CONFIG in Datastage How do we do the automation of dsjobs?
 
 What is trouble shhoting in server jobs ? what are the diff kinds of errors encountered while running any job?
 
 What is Datastage Multi-byte, Single-byte file conversions?how we use that conversions in Datastage?
 
 Explain other Performance tunings to increase the performance of slowly running jobs?
 
 What is Datastage Multi-byte, Single-byte file conversions in Mainframe jobs? what is UTF 8 ? whats use of UTF 8 ?
 
 What Happens if RCP is disable ?
 
 What are Routines and where/how are they written and have you written any routines before?
 
 What is version Control?
 
 What are the Repository Tables in Datastage and What are they?
 
 I want to process 3 files in sequentially one by one , how can i do that. while processing the files it should fetch files automatically .
 
 Where does UNIX script of Datastage executes weather in client machine or in server. suppose if it executes on server then it will execute?
 
 Please list out the versions of Datastage Parallel , server editions and in which year they are released.
 
 What are the Job parameters?
 
 How can I connect my DB2 database on AS400 to Datastage? Do I need to use ODBC 1st to open the database connectivity and then use an adapter for just
 
 connecting between the two?
 
 What is difference between serverjobs & paraller jobs
 
 What is the difference between Datastage and Datastage TX?
 
 How do we extract data from more than 1 heterogeneous Sources.mean, example 1 sequenal file, Sybase , Oracle in a singale Job.
 
 How can we improve the performance of Datastage jobs?
 
 What are the defaults nodes for Datastage parallel Edition
 
 Explain Orchestrate Vs Datastage Parallel Extender?
 
 How can we join one Oracle source and Sequential file?.
 
 What is Modulus and Splitting in Dynamic Hashed File?
 
 What is the Batch Program and how can generate?
 
 What's the difference between Datastage Developers and Datastage Designers. What are the skill's required for this.
 
 Could you please help me with a set of questions on Parallel Extender?
 
 What is difference between Datastage and Informatica
 
 Suppose if there are million records did you use OCI? If not then what stage do you prefer?
 
 What are types of Hashed File?
 
 How do you eliminate duplicate rows?
 
 What is DS Designer used for - did you use it?
 
 How would call an external Java function which are not supported by Datastage?
 
 Why do we have to load the dimensional tables first, then fact tables:
 
 How to create batches in Datastage from command prompt?

DataStage Interview Questions and Answers,Solution and Explanation - Part 5

What is the importance of Surrogate Key in Data warehousing?
Ans :
Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is it is independent of underlying database. i.e Surrogate Key is not affected by the changes going on with a database

What does a Config File in parallel extender consist of?
Ans:
Config file consists of the following. a) Number of Processes or Nodes. b) Actual Disk Storage Location.

How many places you can call Routines?
Ans
:Four Places you can call (i) Transform of routine (A) Date Transformation (B) Upstring Transformation (ii) Transform of the Before & After Subroutines(iii) XML transformation(iv)Web base

How did you handle an 'Aborted' sequencer?
Ans:
In almost all cases we have to delete the data inserted by this from DB manually and fix the job and then run the job again.

Is it possible to calculate a hash total for an EBCDIC file and have the hash total stored as EBCDIC using Datastage ?
Ans:
Currently, the total is converted to ASCII, even tho the individual records are stored as EBCDIC.

Compare and Contrast ODBC and Plug-In stages?
Ans:
ODBC :
a) Poor Performance.
b) Can be used for Variety of Databases.
c) Can handle Stored Procedures.
Plug-In: a) Good Performance. b) Database specific.(Only one database)

What is Functionality of Link Partitioner and Link Collector?
Ans:
Containers : Usage and Types?
Containers is a collection of stages used for the purpose of Reusability. There are 2 types of Containers. a) Local Container: Job Specific b) Shared Container: Used in any job within a project.

Explain Dimension Modelling types along with their significance
Ans:
Data Modelling is Broadly classified into 2 types. a) E-R Diagrams (Entity - Relatioships). b) Dimensional Modelling.

Did you Parameterize the job or hard-coded the values in the jobs?
Ans:
Always parameterized the job. Either the values are coming from Job Properties or from a ‘Parameter Manager’ – a third part tool. There is no way you will hard–code some parameters in your jobs.

How did you connect with DB2 in your last project?
Ans:
Most of the times the data was sent to us in the form of flat files. The data is dumped and sent to us. In some cases were we need to connect to DB2 for look-ups as an instance then we used ODBC drive.

What are the often used Stages or stages you worked with in your last project?
Ans:
A) Transformer, ORAOCI8/9, ODBC, Link-Partitioner, Link-Collector, Hash, ODBC, Aggregator, Sort.

How many jobs have you created in your last project?
Ans:
100+ jobs for every 6 months if you are in Development, if you are in testing 40 jobs for every 6 months although it need not be the same number for everybody

Have you ever involved in updating the DS versions like DS 5.X, if so tell us some the steps you have taken in doing so?
Ans:
Yes. The following are some of the steps; I have taken in doing so:
1) Definitely take a back up of the whole project(s) by exporting the project as a .dsx file.
2) See that you are using the same parent

DataStage Interview Questions and Answers,Solution and Explanation - Part4

What versions of DS you worked with?
Ans:
DS 7.0.2/6.0/5.2

If worked with DS6.0 and latest versions what are Link-Partitioner and Link-Collector used for?
Ans:
Link Partitioner - Used for partitioning the data.Link Collector - Used for collecting the partitioned data.

How do you rename all of the jobs to support your new File-naming conventions?
Ans:
Create a Excel spreadsheet with new and old names. Export the whole project as a dsx. Write a Perl program, which can do a simple rename of the strings looking up the Excel file.

Explain the types of Parallel Processing?
Ans:
Parallel Processing is broadly classified into 2 types. a) SMP - Symmetrical Multi Processing. b) MPP - Massive Parallel Processing.

Does the selection of 'Clear the table and Insert rows' in the ODBC stage send a Truncate statement to the DB or does it do some kind of Delete logic.
Ans:
There is no TRUNCATE on ODBC stages. It is Clear table blah blah and that is a delete from statement. On an OCI stage such as Oracle, you do have both Clear and Truncate options.

When should we use ODS?
Ans:
DWH's are typically read only, batch updated on a scheduleODS's are maintained in more real time, trickle fed constantly

What is the default cache size? How do you change the cache size if needed?
Ans:
Default cache size is 256 MB. We can incraese it by going into Datastage Administrator and selecting the Tunable Tab and specify the cache size over there.

What are the types of Parallel Processing?
Ans:
Parallel Processing is broadly classified into 2 types. a) SMP - Symmetrical Multi Processing. b) MPP - Massive Parallel Processing.

How to handle Date convertions in Datastage ? Convert a mm/dd/yyyy format to yyyy-dd-mm?
Ans:
We use a) "Iconv" function - Internal Convertion. b) "Oconv" function - External Convertion. Function to convert mm/dd/yyyy format to yyyy-dd-mm is Oconv(Iconv(Filedname,"D/M

Differentiate Primary Key and Partition Key?
Ans:
Primary Key is a combination of unique and not null. It can be a collection of key values called as composite primary key. Partition Key is a just a part of Primary Key.

Is it possible to calculate a hash total for an EBCDIC file and have the hash total stored as EBCDIC using Datastage ?
Ans:
Currently, the total is converted to ASCII, even tho the individual records are stored as EBCDIC.

How do you merge two files in DS?
Ans:
Either used Copy command as a Before-job subroutine if the metadata of the 2 files are same or created a job to concatenate the 2 files into one if the metadata is different.

How did you connect to DB2 in your last project?
Ans:
Using DB2 ODBC drivers.

What is the default cache size? How do you change the cache size if needed?
Ans:
Default cache size is 256 MB. We can incraese it by going into Datastage Administrator and selecting the Tunable Tab and specify the cache size over there.

What are Sequencers?
Ans:
Sequencers are job control programs that execute other jobs with preset Job parameters.

How do you execute Datastage job from command line prompt?
Ans:
Using "dsjob" command as follows. dsjob -run -jobstatus projectname jobname

How do you rename all of the jobs to support your new File-naming conventions?
Ans:
Create a Excel spreadsheet with new and old names. Export the whole project as a dsx. Write a Perl program, which can do a simple rename of the strings looking up the Excel file. Then import the new dsx file probably into a new project for testing. Recompile all jobs. Be cautious that the name of the jobs has also been changed in your job control jobs or Sequencer jobs. So you have to make the necessary changes to these Sequencers.

Pages

Subscribe to Data Stage