Skip to content

How to configure Apache Hadoop/Spark to use SWIFT

Hadoop supports SWIFT since version 2.3.0, so that you can access directly a Swift storage without need to create an HDFS filesystem.

To configure Apache Hadoop or Apache Spark to use SWIFT you need to create a new service in your core-site.xml file. The relevant section will look like:

<configuration>

  <property>
    <name>fs.swift.impl</name>
    <value>org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem</value>
  </property>

  <property>
    <name>fs.swift.service.ScienceCloud.auth.url</name>
    <value>https://cloud.s3it.uzh.ch:5000/v2.0/tokens</value>
  </property>

  <property>
    <name>fs.swift.service.ScienceCloud.auth.endpoint.prefix</name>
    <value>/AUTH_</value>
  </property>

  <property>
    <name>fs.swift.service.ScienceCloud.http.port</name>
    <value>8080</value>
  </property>

  <property>
    <name>fs.swift.service.ScienceCloud.region</name>
    <value>RegionOne</value>
  </property>

  <property>
    <name>fs.swift.service.ScienceCloud.public</name>
    <value>false</value>
  </property>

  <property>
    <name>fs.swift.service.ScienceCloud.tenant</name>
    <value>SCIENCECLOUD-PROJECT-NAME</value>
  </property>

  <property>
    <name>fs.swift.service.ScienceCloud.username</name>
    <value>UZH-SHORTNAME</value>
  </property>

  <property>
    <name>fs.swift.service.ScienceCloud.password</name>
    <value>UZH-WEBPASS</value>
  </property>

</configuration>

To access a container MyData from within Hadoop/Spark you can then use the URL

swift://MyData.ScienceCloud/objectname