Lazy's Technological Blog: July 2015

Big Data environments are characterized by a multiplicity of technologies, distributed data repositories, and parallel computation systems with different deployment models. With all that complexity,organizations want to maintain data privacy, to ensure that the data will not be exposed to unauthorized parties.
Organizations also need to provide a unified security mechanism that allows Single SignOn, ensuring that any service connected to the data cluster goes through the authentication process to be
permitted to access the data.
Like other distributed systems, Big Data clusters share the same security weaknesses. Distributed systems are demanding to ensure that parties are who they claim to be, to verify
client applications before they join the cluster and access the data that resides on federated systems.
This article describes the series of steps required to set up an IBM Big Data environment using Kerberos for host validation and authentication of client applications. The environment settings were based on the requirements of an IBM customer, as described in the next section of this article.

System Requirements:

Following are the list of the system requirements for this tutorial:

The system must manage a large number of documents and the metadata for those documents. The documents are classified into a variety of different topics and categories.
The system should handle many different document types (such as html, PDF, spreadsheets etc.) that are originated by many different systems.
The system should provide a federated search that considers the documents as well as the relevant topics that are associated with them.
The document categories are mapped to different authorization groups. Users belonging to those groups will have access to the corresponding documents.
The metadata is added to throughout the document’s life cycle.

The Proof Of Concept (PoC) documented in this article demonstrates the ability to apply a single sign-on mechanism in a subset of the proposed environment while using a Kerberos ticket to authenticate hosts, users and add-on services to the BigInsights Hadoop cluster

Contents of this document:

Background

Topology solution and hosts

Installation prerequisites:

Setting up users and groups in open ldap:

Step 1: Setting up the Linux machines:

1. Host Name setup :

Host name requirements:

Host resolution:

Passwordless ssh for root user

3. Install ldap client (on each Linux node)

4. Install DB2 prerequisites (on each Linux node)

5. Install Kerberos V5 client libraries on each of the Linux machines (4 total)

6. Install various prerequisites

7. Disable IPV6 on all nodes

8. Disable firewall

9. Disable Selinux

10. Create disks for data store

11. Configure Sudo permissions for admin user:

12. Configure limits.conf on each BI node:

13. Configure /etc/ssh/sshd_config on each BI node

14. Configure pam_ ldap module

15. Configure SSHD at /etc/pam.d/sshd

16. Configure System auth at /etc/pam.d/system-auth

17. Configure ladp configuiration at /etc/openldap/ldap.conf

18. Configure name service daemon at /etc/nslcd.conf

19. Configure name service switch at /etc/nsswitch.conf

20. Configure pam_ldap.conf at /etc/pam_ldap.conf

21. Copy certs from openLDAP server to all of the BigInsights nodes

22. Start local name service daemon (nslcd)

Step 2: Setting up IBM JDK and JCE:

Download and Install IBM JDK and JCE on Linux servers:

Step 3: Open LDAP time synchronization

Step 4: Configuring Kerberos client on all BigInsights nodes

1. /etc/krb5.conf on each of your Linux machines (4 total)

2. Add Kerberos service definitions to each /etc/services (all Linux machines)

Step 5: Creating and deploying host keytabs

1. Create the host keytabs

2. Configure sssd (security deamon) file on each node

3. Caching enablement

4. Deploy initialize and test the host keytabs

Step 6: Create the service Keytabs:

Step 7: Initialize the service keytabs

Step 8: Create the cluster hosts file for the BigInsights installer

Step 9: Run BigInsights installer prechecker

Step 10: BigInsights installation

Prefix 1: Complete users LDIF file

Prefix 2: Complete groups LDIF file

Prefix 3: Complete hosts LDIF file

Download the complete tutorial

Following article complements this article , it explains how to set up kerberos on microsoft active directory:

Configuring kerberos for hadoop with Active Directory

IBM Kerberos Automation Toolkit for hadoop

An automation toolkit is available for download to ease up setting up this environment, The latest version of the automation toolkit can be downloaded from this location :

IBM Kerberos Automation Toolkit for hadoop

This article was made possible because of the mutual work of Me and Roman Zeltser .

Lazy's Technological Blog

Monday, July 27, 2015

Securing hadoop environments with MIT Kerberos OpenLDAP and IBM BigInsights 3.0.0.2