IIBM Institute of Business Management
IIBM Institute of Business Management
Examination Paper MM.100
Big Data
Section A: Objective Type & Short
Questions (30 Marks)
This section consists of Multiple
choice and Short Note type questions
Answer all the questions.
Part One carries 1 mark each and Part
Two carries 5 marks each.
Part One:
Multiple choices:
1. What does commodity Hardware in Hadoop
world mean?
a. Very cheap hardware
b. Industry standard hardware
c. Discarded hardware
d. Low specifications Industry grade
hardware
2. Which of the following are NOT big
data problem(s)?
a. Parsing 5 MB XML file every 5 minutes
b. Processing IPL tweet sentiments
c. Processing online bank transactions
d. both (a) and (c)
3. What does “Velocity” in Big Data mean?
a. Speed of input data generation
b. Speed of individual machine processors
c. Speed of ONLY storing data
d. Speed of storing and processing data
4. The term Big Data first originated
from:
a. Stock Markets Domain
b. Banking and Finance Domain
c. Genomics and Astronomy Domain
d. Social Media Domain
5. Which of the following Batch
Processing instance is NOT an example of
Big Data Batch
Processing?
a. Processing 10 GB sales data every 6
hours
b. Processing flights sensor data
c. Web crawling app
d. Trending topic analysis of tweets for
last 15 minutesExamination Paper of
Business Analytics
7
IIBM Institute of Business Management
6. Which of the following are example(s)
of Real Time Big Data Processing?
a. Complex Event Processing (CEP)
platforms
b. Stock market data analysis
c. Bank fraud transactions detection
d. both (a) and (c)
7. Sliding window operations typically
fall in the category
of__________________.
a. OLTP Transactions
b. Big Data Batch Processing
c. Big Data Real Time Processing
d. Small Batch Processing
8. What is HBase used as?
a. Tool for Random and Fast Read/Write
operations in Hadoop
b. Faster Read only query engine in
Hadoop
c. Map Reduce alternative in Hadoop
d. Fast Map Reduce layer in Hadoop
9. What is Hive used as?
a. Hadoop query engine
b. Map Reduce wrapper
c. Hadoop SQL interface
d. All of the above
10. Which of the following are NOT true
for Hadoop?
a. It’s a tool for Big Data analysis
b. It supports structured and
unstructured data analysis
c. It aims for vertical scaling out/in
scenarios
d. Both (a) and (c)
Part Two:
1. Define Unstructured Data Analytics.
Elaborate on Context-Sensitive and
Domain-Specific
Searches.
2. Define HDFS. Explain HDFS in detail.
3. What is Complexity Theory for Map-
Reduce? What is Reducer Size and
Replication Rate?
4. Write at least five Big Data Analytics
Applications in detail.
END OF SECTION AExamination Paper of
Business Analytics
8
IIBM Institute of Business Management
Section B: Caselets (40 marks)
This section consists of Caselets.
Answer all the questions.
Each caselet carries 20 marks.
Detailed information should form the
part of your answer (Word limit 150 to
200 words).
Caselet 1
CloudEra
One major global financial services
conglomerate uses Cloudera and Datameer
to help identify rogue
trading activity. Teams within the firm’s
asset management group are performing ad
hoc analysis on daily
feeds of price, position, and order
information. Having ad hoc analysis to
all of the detailed data allows
the group to detect anomalies across
certain asset classes and identify
suspicious behavior. Users
previously relied solely on desktop
spreadsheet tools. Now, with Datameer and
Cloudera, users have a
powerful platform that allows them to
sift through more data more quickly and
avert potential losses
before they begin.
.A leading retail bank is using Cloudera
and Datameer to validate data accuracy
and quality as required by
the Dodd-Frank Act and other regulations.
Integrating loan and branch data as well
as wealth
management data, the bank’s data quality
initiative is responsible for ensuring
that every record is
accurate. The process includes subjecting
the data to over 50 data sanity and
quality checks. The results of
those checks are trended over time to
ensure that the tolerances for data
corruption and data domains
aren’t changing adversely and that the
risk profiles being reported to investors
and regulatory agencies are
prudent and in compliance with regulatory
requirements. The results are reported
through a data quality
dashboard to the Chief Risk Officer and
Chief Financial Officer, who are
ultimately responsible for
ensuring the accuracy of regulatory
compliance reporting as well as earnings
forecasts to investors
Questions:
1. What kind of data these companies
used. What was the size of the data? What
kind of of tools
technologies they used to process the
data?
2. What was the problem they were facing
and how the insight they got the data
helped them to
resolve the issue.
Caselet 2
Adopting a new technology is never a
trivial task. Introducing a brand new
tool into a data scientist’s
toolset is no different. The resistance
to change is especially high in companies
that employ tens or
hundreds of statisticians.
Understandably, analysts have learned to
love their tool and live with any
shortcomings. The effort required to
learn a more efficient tool often seems
too great even if such a
transition would lead to long-term time
savings. This is where Pivotal Data Labs
(PDL) comes into the
picture, using a team of highly skilled
set of data scientists and engineers to
prove results to our customers
such as:Examination Paper of Business
Analytics
9
IIBM Institute of Business Management
Shorter time to insight and to market
Better utilization of all captured data
(both structured and unstructured)
Improved model quality and better
decision-making
Minimized data movement and need to
create multiple copies
Here describes an example journey to
technology adoption executed through a
series of data science
engagements solving real problems for our
customer, a major healthcare provider.
This customer has a
large division of research, and as a
trailblazer in preventive healthcare,
employs many accomplished
clinicians and biostatisticians who are
limited by the analytics tools that they
use. The journey they took
shows how analytics can be done faster
and better through a series of 5 projects
(Figure 1). Each project
answered different questions, proving the
need and utility of new tools in
advancing their data science
practices, improving their business, and
ultimately leading to the decision to
adopt new technology.
Questions:
1. What kind of pattern they identified
from the data & what kind of patterns
they were looking
from the data.
2. How they selected the tool/technology
to suit their need.
END OF SECTION B
Section C: Applied Theory (30 marks)
This section consists of Long
Questions.
Answer all the questions.
Each question carries 15 marks.
Detailed information should form the
part of your answer (Word limit 200 to
250 words).Examination Paper of Business
Analytics
10
IIBM Institute of Business Management
1. Explain HBase and their data model and
implementations? Cassandra data model
with an
example? Explain in details about the
Hive data manipulation, queries, data
definition and
data types?
2. Explain Crowd sourcing analytics and
inter and Trans firewall analytics?
END OF SECTION C
IIBM Institute of Business Management
Examination Paper MM.100
Big Data
Section A: Objective Type & Short
Questions (30 Marks)
This section consists of Multiple
choice and Short Note type questions
Answer all the questions.
Part One carries 1 mark each and Part
Two carries 5 marks each.
Part One:
Multiple choices:
1. What does commodity Hardware in Hadoop
world mean?
a. Very cheap hardware
b. Industry standard hardware
c. Discarded hardware
d. Low specifications Industry grade
hardware
2. Which of the following are NOT big
data problem(s)?
a. Parsing 5 MB XML file every 5 minutes
b. Processing IPL tweet sentiments
c. Processing online bank transactions
d. both (a) and (c)
3. What does “Velocity” in Big Data mean?
a. Speed of input data generation
b. Speed of individual machine processors
c. Speed of ONLY storing data
d. Speed of storing and processing data
4. The term Big Data first originated
from:
a. Stock Markets Domain
b. Banking and Finance Domain
c. Genomics and Astronomy Domain
d. Social Media Domain
5. Which of the following Batch
Processing instance is NOT an example of
Big Data Batch
Processing?
a. Processing 10 GB sales data every 6
hours
b. Processing flights sensor data
c. Web crawling app
d. Trending topic analysis of tweets for
last 15 minutesExamination Paper of
Business Analytics
7
IIBM Institute of Business Management
6. Which of the following are example(s)
of Real Time Big Data Processing?
a. Complex Event Processing (CEP)
platforms
b. Stock market data analysis
c. Bank fraud transactions detection
d. both (a) and (c)
7. Sliding window operations typically
fall in the category
of__________________.
a. OLTP Transactions
b. Big Data Batch Processing
c. Big Data Real Time Processing
d. Small Batch Processing
8. What is HBase used as?
a. Tool for Random and Fast Read/Write
operations in Hadoop
b. Faster Read only query engine in
Hadoop
c. Map Reduce alternative in Hadoop
d. Fast Map Reduce layer in Hadoop
9. What is Hive used as?
a. Hadoop query engine
b. Map Reduce wrapper
c. Hadoop SQL interface
d. All of the above
10. Which of the following are NOT true
for Hadoop?
a. It’s a tool for Big Data analysis
b. It supports structured and
unstructured data analysis
c. It aims for vertical scaling out/in
scenarios
d. Both (a) and (c)
Part Two:
1. Define Unstructured Data Analytics.
Elaborate on Context-Sensitive and
Domain-Specific
Searches.
2. Define HDFS. Explain HDFS in detail.
3. What is Complexity Theory for Map-
Reduce? What is Reducer Size and
Replication Rate?
4. Write at least five Big Data Analytics
Applications in detail.
END OF SECTION AExamination Paper of
Business Analytics
8
IIBM Institute of Business Management
Section B: Caselets (40 marks)
This section consists of Caselets.
Answer all the questions.
Each caselet carries 20 marks.
Detailed information should form the
part of your answer (Word limit 150 to
200 words).
Caselet 1
CloudEra
One major global financial services
conglomerate uses Cloudera and Datameer
to help identify rogue
trading activity. Teams within the firm’s
asset management group are performing ad
hoc analysis on daily
feeds of price, position, and order
information. Having ad hoc analysis to
all of the detailed data allows
the group to detect anomalies across
certain asset classes and identify
suspicious behavior. Users
previously relied solely on desktop
spreadsheet tools. Now, with Datameer and
Cloudera, users have a
powerful platform that allows them to
sift through more data more quickly and
avert potential losses
before they begin.
.A leading retail bank is using Cloudera
and Datameer to validate data accuracy
and quality as required by
the Dodd-Frank Act and other regulations.
Integrating loan and branch data as well
as wealth
management data, the bank’s data quality
initiative is responsible for ensuring
that every record is
accurate. The process includes subjecting
the data to over 50 data sanity and
quality checks. The results of
those checks are trended over time to
ensure that the tolerances for data
corruption and data domains
aren’t changing adversely and that the
risk profiles being reported to investors
and regulatory agencies are
prudent and in compliance with regulatory
requirements. The results are reported
through a data quality
dashboard to the Chief Risk Officer and
Chief Financial Officer, who are
ultimately responsible for
ensuring the accuracy of regulatory
compliance reporting as well as earnings
forecasts to investors
Questions:
1. What kind of data these companies
used. What was the size of the data? What
kind of of tools
technologies they used to process the
data?
2. What was the problem they were facing
and how the insight they got the data
helped them to
resolve the issue.
Caselet 2
Adopting a new technology is never a
trivial task. Introducing a brand new
tool into a data scientist’s
toolset is no different. The resistance
to change is especially high in companies
that employ tens or
hundreds of statisticians.
Understandably, analysts have learned to
love their tool and live with any
shortcomings. The effort required to
learn a more efficient tool often seems
too great even if such a
transition would lead to long-term time
savings. This is where Pivotal Data Labs
(PDL) comes into the
picture, using a team of highly skilled
set of data scientists and engineers to
prove results to our customers
such as:Examination Paper of Business
Analytics
9
IIBM Institute of Business Management
Shorter time to insight and to market
Better utilization of all captured data
(both structured and unstructured)
Improved model quality and better
decision-making
Minimized data movement and need to
create multiple copies
Here describes an example journey to
technology adoption executed through a
series of data science
engagements solving real problems for our
customer, a major healthcare provider.
This customer has a
large division of research, and as a
trailblazer in preventive healthcare,
employs many accomplished
clinicians and biostatisticians who are
limited by the analytics tools that they
use. The journey they took
shows how analytics can be done faster
and better through a series of 5 projects
(Figure 1). Each project
answered different questions, proving the
need and utility of new tools in
advancing their data science
practices, improving their business, and
ultimately leading to the decision to
adopt new technology.
Questions:
1. What kind of pattern they identified
from the data & what kind of patterns
they were looking
from the data.
2. How they selected the tool/technology
to suit their need.
END OF SECTION B
Section C: Applied Theory (30 marks)
This section consists of Long
Questions.
Answer all the questions.
Each question carries 15 marks.
Detailed information should form the
part of your answer (Word limit 200 to
250 words).Examination Paper of Business
Analytics
10
IIBM Institute of Business Management
1. Explain HBase and their data model and
implementations? Cassandra data model
with an
example? Explain in details about the
Hive data manipulation, queries, data
definition and
data types?
2. Explain Crowd sourcing analytics and
inter and Trans firewall analytics?
END OF SECTION C
No comments:
Post a Comment