Big Data environments are characterized by a multiplicity of technologies, distributed data repositories, and parallel computation systems with different deployment models. With all that complexity,organizations want to maintain data privacy, to ensure that the data will not be exposed to unauthorized parties.
Organizations also need to provide a unified security mechanism that allows Single SignOn, ensuring that any service connected to the data cluster goes through the authentication process to be
permitted to access the data.
Like other distributed systems, Big Data clusters share the same security weaknesses. Distributed systems are demanding to ensure that parties are who they claim to be, to verify
client applications before they join the cluster and access the data that resides on federated systems.
This article describes the series of steps required to set up an IBM Big Data environment using Kerberos for host validation and authentication of client applications. The environment settings were based on the requirements of an IBM customer, as described in the next section of this article.
System Requirements:
Following are the list of the system requirements for this tutorial:
This article was made possible because of the mutual work of Me and Roman Zeltser .
Organizations also need to provide a unified security mechanism that allows Single SignOn, ensuring that any service connected to the data cluster goes through the authentication process to be
permitted to access the data.
Like other distributed systems, Big Data clusters share the same security weaknesses. Distributed systems are demanding to ensure that parties are who they claim to be, to verify
client applications before they join the cluster and access the data that resides on federated systems.
This article describes the series of steps required to set up an IBM Big Data environment using Kerberos for host validation and authentication of client applications. The environment settings were based on the requirements of an IBM customer, as described in the next section of this article.
System Requirements:
Following are the list of the system requirements for this tutorial:
- The system must manage a large number of documents and the metadata for those documents. The documents are classified into a variety of different topics and categories.
- The system should handle many different document types (such as html, PDF, spreadsheets etc.) that are originated by many different systems.
- The system should provide a federated search that considers the documents as well as the relevant topics that are associated with them.
- The document categories are mapped to different authorization groups. Users belonging to those groups will have access to the corresponding documents.
- The metadata is added to throughout the document’s life cycle.
The Proof Of Concept (PoC) documented in this article demonstrates the ability to apply a single sign-on mechanism in a subset of the proposed environment while using a Kerberos ticket to authenticate hosts, users and add-on services to the BigInsights Hadoop cluster
Contents of this document:
Background
Topology solution and hosts
Installation prerequisites:
Setting up users and groups in open ldap:
Step 1: Setting up the Linux machines:
1. Host Name setup :
Host name requirements:
Host resolution:
Passwordless ssh for root user
3. Install ldap client (on each Linux node)
4. Install DB2 prerequisites (on each Linux node)
5. Install Kerberos V5 client libraries on each of the Linux machines (4 total)
6. Install various prerequisites
7. Disable IPV6 on all nodes
8. Disable firewall
9. Disable Selinux
10. Create disks for data store
11. Configure Sudo permissions for admin user:
12. Configure limits.conf on each BI node:
13. Configure /etc/ssh/sshd_config on each BI node
14. Configure pam_ ldap module
15. Configure SSHD at /etc/pam.d/sshd
16. Configure System auth at /etc/pam.d/system-auth
17. Configure ladp configuiration at /etc/openldap/ldap.conf
18. Configure name service daemon at /etc/nslcd.conf
19. Configure name service switch at /etc/nsswitch.conf
20. Configure pam_ldap.conf at /etc/pam_ldap.conf
21. Copy certs from openLDAP server to all of the BigInsights nodes
22. Start local name service daemon (nslcd)
Step 2: Setting up IBM JDK and JCE:
Download and Install IBM JDK and JCE on Linux servers:
Step 3: Open LDAP time synchronization
Step 4: Configuring Kerberos client on all BigInsights nodes
1. /etc/krb5.conf on each of your Linux machines (4 total)
2. Add Kerberos service definitions to each /etc/services (all Linux machines)
Step 5: Creating and deploying host keytabs
1. Create the host keytabs
2. Configure sssd (security deamon) file on each node
3. Caching enablement
4. Deploy initialize and test the host keytabs
Step 6: Create the service Keytabs:
Step 7: Initialize the service keytabs
Step 8: Create the cluster hosts file for the BigInsights installer
Step 9: Run BigInsights installer prechecker
Step 10: BigInsights installation
Prefix 1: Complete users LDIF file
Prefix 2: Complete groups LDIF file
Prefix 3: Complete hosts LDIF file
Following article complements this article , it explains how to set up kerberos on microsoft active directory:
IBM Kerberos Automation Toolkit for hadoop
An automation toolkit is available for download to ease up setting up this environment, The latest version of the automation toolkit can be downloaded from this location :
This article was made possible because of the mutual work of Me and Roman Zeltser .