简介:

Hadoop分布式文件系统(HDFS)被设计成适合运行在通用硬件上的分布式文件系统。它和现有的分布式文件系统有很多共同点。但同时,它和其他的分布式文件系统的区别也是很明显的。HDFS是一个高度容错性的系统,适合部署在廉价的机器上。HDFS能提供高吞吐量的数据访问,非常适合大规模数据集上的应用。HDFS放宽了一部分POSIX约束,来实现流式读取文件系统数据的目的。

版本:

当前订阅的HDFS服务的版本:1.0

订阅:

1 参考寄云NeuSeer平台使用手册完成HDFS服务的订阅,订阅过程中,可以输入HDFS服务名称。

2 成功订阅后,在服务详情中将呈现用户名、密码、服务网关、用户目录、HDFS访问地址等信息。

数据文件上传:

目前版本主要通过两种途径完成数据文件的上传操作:curl命令行方式和Postman API方式。

curl命令行方式:

1 通过curl命令在HDFS上产生上传数据文件的位置,curl命令组成如下:

curl -i -k -u 用户名:用户密码 -X PUT ‘服务网关+用户目录+文件名称+ ?op=CREATE’

命令示例:

用户名:u_dfs_Ng061Fpu

密码:****************************************

服务网关:http://54.223.242.107:8443/gateway/default/webhdfs/v1

用户目录:/user/u_dfs_Ng061Fpu

上传文件名称:test.json

curl -i -k -u u_dfs_Ng061Fpu:*************************** -X PUT 'http://54.223.242.107:8443/gateway/default/webhdfs/v1/user/u_dfs_Ng061Fpu/test.json?op=CREATE'

2 使用curl命令上传文件,curl命令组成如下:

curl -i -k -u 用户名:用户密码 -T 文件名称 '上一步命令反馈的Location地址'

命令示例:

curl -i -k -u u_dfs_Ng061Fpu:******************************* -T test.json 'http://54.223.242.107:8443/gateway/default/webhdfs/data/v1/webhdfs/v1/user/u_dfs_Ng061Fpu/test.json?_=AAAACAAAABAAAAFQrGlO1mf6enoIvOgeG9B-0tH4zj7X2qFSg72hDDMZDjDB24mSyJOKUkcj-6Ig-YmKjtkdlehmQJpfCEU6W9eWa7zeQnQTp_4vXJjbtpDwKMknOHyF8OUiAX3wLbTMRMAoZbKVeuVZAaDdIXPaSQUf2A-EYYzrnQuAkPzvPjTNEn5pTfgPEEdS3-AJkGbVqz9RLciTPftisZXsHjumPYU7Y4db0DXTcagisvONk864wjClpJqWxtyP9jzooBo4ZLjOb8BUFGmXHpz4GSoyfs7udZgBIHQcwpDg_xLmmYDxJZeokWWgVEh8QIFPQwHzFqbixn-LtqvAz_wlzFuh1yIbkNibLy8kRhbY-fWPI5VTRXEh6nki1LnntSyThSOM5mjydhRWJJm1-7a7Z4gOeaIeI8N8hYehYbdGqfawujqzQDFkV3fF_GbLYgUM4UQTGikl53gyWQJQywaV-Azk70U42ORGIzsw6ZH_'

3 如果需要通过Spark或其它方法对这些数据进行访问及处理,需要修改上传的数据文件的访问权限,以便Spark能够操作该数据文件,curl命令组成如下:

curl -i -u 用户名:用户密码 -X PUT '服务网关+用户目录+文件名称?op=SETACL&aclspec=user::rwx,group::r-x,other::r-x'

如果需要保护自己的数据文件,则可以设置为:

curl -i -u 用户名:用户密码 -X PUT '服务网关+用户目录+文件名称?op=SETACL&aclspec=user::rwx,group::---,other::---,user:订阅的Spark服务中的用户名:r-x'

命令示例:

curl -i -u u_dfs_Ng061Fpu:************************ -X PUT 'http://54.223.242.107:8443/gateway/default/webhdfs/v1/user/u_dfs_Ng061Fpu/test.json?op=SETACL&aclspec=user::rwx,group::r-x,other::r-x'

如果需要保护自己的数据文件,则命令示例:

订阅的Spark服务中的用户名:u_spk_7IOwdftm

curl -i -u u_dfs_Ng061Fpu:************************* -X PUT 'http://54.223.242.107:8443/gateway/default/webhdfs/v1/user/u_dfs_Ng061Fpu/test.json?op=SETACL&aclspec=user::rwx,group::---,other::---,user:u_spk_7IOwdftm:r-x'

同时还要修改用户目录的访问权限(如果在用户目录下自定义了子目录,也要修改),curl命令组成如下:

curl -i -u 用户名:用户密码 -X PUT '服务网关+用户目录?op=SETACL&aclspec=user::rwx,group::r-x,other::r-x'

如果需要保护自己的数据文件目录,则可以设置为:

curl -i -u 用户名:用户密码 -X PUT '服务网关+用户目录?op=SETACL&aclspec=user::rwx,group::---,other::---,user:订阅的Spark服务中的用户名:r-x'

命令示例:

curl -i -u u_dfs_Ng061Fpu:*************************** -X PUT 'http://54.223.242.107:8443/gateway/default/webhdfs/v1/user/u_dfs_Ng061Fpu?op=SETACL&aclspec=user::rwx,group::r-x,other::r-x'

如果需要保护自己的数据文件目录,则命令示例:

curl -i -u u_dfs_Ng061Fpu:**************************** -X PUT 'http://54.223.242.107:8443/gateway/default/webhdfs/v1/user/u_dfs_Ng061Fpu?op=SETACL&aclspec=user::rwx,group::---,other::---,user:u_spk_7IOwdftm:r-x'

Postman API方式:

注:需要在Chrome浏览器中安装Postman应用和postman Interceptor插件,并开启postman Interceptor插件。并在postman设置的settings中关闭Automatically follow redirects。

1 选择PUT操作,URL构成为:

服务网关+用户目录+文件名称+ ?op=CREATE

示例:

用户名:u_dfs_Ng061Fpu

密码:**************************************

服务网关:http://54.223.242.107:8443/gateway/default/webhdfs/v1

用户目录:/user/u_dfs_Ng061Fpu

上传文件名称:test.json

http://54.223.242.107:8443/gateway/default/webhdfs/v1/user/u_dfs_Ng061Fpu/data/test.json?op=CREATE

然后,选择基本认证(Basic Auth)选项卡,在用户名和密码处输入用户名及密码后,点击“Update Request”按钮。

在“Body“选项卡下选择“binary”后,通过点击“选择文件”选中要上传的文件。

2 点击“Send”按钮,完成文件的上传。如果需要通过Spark或其它方法对这些数据进行访问及处理,需要修改上传的数据文件及相关文件夹的访问权限,以便Spark能够操作该数据文件,选择PUT操作,URL构成为:

服务网关+用户目录+文件名称?op=SETACL&aclspec=user::rwx,group::r-x,other::r-x

如果需要保护自己的数据文件,则可以设置为:

服务网关+用户目录?op=SETACL&aclspec=user::rwx,group::---,other::---,user:订阅的Spark服务中的用户名:r-x

示例:

http://54.223.242.107:8443/gateway/default/webhdfs/v1/user/u_dfs_Ng061Fpu/test.json?op=SETACL&aclspec=user::rwx,group::r-x,other::r-x

如果需要保护自己的数据文件,则示例为:

http://54.223.242.107:8443/gateway/default/webhdfs/v1/user/u_dfs_Ng061Fpu/test.json?op=SETACL&aclspec=user::rwx,group::---,other::---,user:u_spk_7IOwdftm:r-x

3 点击“Send”按钮完成权限修订,同时还要修改用户目录的访问权限(如果在用户目录下自定义了子目录,也要修改)。

Python客户端方式:

1 安装配置Python客户端(以下以在Ubuntu16为示例安装webhdfs-client)

运行以下命令,完成安装操作:

pip install git+https://github.com/luff/webhdfs-client.git

在用户对应的home目录下输入.whdfsc.json,文件内容如下:

{
  "insecure": false,
  "username": "your-webhdfs-user",
  "password": "your-webhdfs-user-pw",
  "rest_api": "https://your-webhdfs-gateway/webhdfs/v1"
}

将文件中your-webhdfs-user、your-webhdfs-user-pw、https://your-webhdfs-gateway/webhdfs/v1修改为对应的内容,如下图所示:

2 使用whdfsc命令完成文件上传操作,whdfsc命令格式为:

whdfsc put 本地文件目录/文件名称 HDFS用户目录/(文件目录/)文件名称
whdfsc put /tmp/test.json /user/u_dfs_jhzv0sxm/test.json

使用以下whdfsc命令可以完成目录和文件权限的设置:

文件权限设置:

whdfsc chmod -p 权限 HDFS用户目录/(文件目录/)文件名称
whdfsc chmod -p 755 /user/u_dfs_jhzv0sxm/test.json

目录权限的设置:

whdfsc chmod -p 权限 HDFS用户目录/(文件目录/)
whdfsc chmod -p 755 /user/u_dfs_jhzv0sxm

使用Spark服务处理数据:

1 参考寄云NeuSeer平台使用手册完成Spark服务的订阅,并进入管理服务界面。

2 在Zeppelin界面上输入以下代码:

import scala.util.parsing.json.JSON
import org.apache.hadoop.fs.{FileSystem, FileUtil, Path}
//import org.apache.hadoop.gateway.shell.Hadoop
//import org.apache.hadoop.gateway.shell.hdfs.Hdfs
import cn.neucloud.dasuan.analysis.timeseries.Sessionize
import cn.neucloud.dasuan.analysis.stat.BaseStatistic
import cn.neucloud.dasuan.SparkContextImpl
import org.apache.spark.sql.SQLContext;
import cn.neucloud.dasuan.analysis.timeseries.Sessionize
import cn.neucloud.dasuan.analysis.timeseries.sparkts._
import cn.neucloud.dasuan.utils.DFTools


val sc = SparkContextImpl("test", true)
val sqlContext = new SQLContext(sc)
val ses = new Sessionize()
val dt=DFTools.fromJson("HDFS访问地址+用户目录+文件名称",sc)
val dataFrame = ses.byInterval(sc, dt, 3000L, 2L, 20L)
dataFrame.printSchema();
dataFrame.show();

3 运行后,即可完成对数据进行3000毫秒间隔分组的处理。

test.json文件如下:

{"timestamp":"2016-08-01 01:00:01","tempreature":"20.12","id":"01"},
{"timestamp":"2016-08-01 01:00:02","tempreature":"21.23","id":"02"},
{"timestamp":"2016-08-01 01:00:06","tempreature":"24.45","id":"03"},
{"timestamp":"2016-08-01 01:00:07","tempreature":"27.05","id":"04"},
{"timestamp":"2016-08-01 01:00:09","tempreature":"27.10","id":"05"},
{"timestamp":"2016-08-01 01:00:11","tempreature":"23.18","id":"06"},
{"timestamp":"2016-08-01 01:00:12","tempreature":"29.10","id":"07"},
{"timestamp":"2016-08-01 01:00:34","tempreature":"30.34","id":"08"},
{"timestamp":"2016-08-01 01:00:44","tempreature":"20.10","id":"09"},
{"timestamp":"2016-08-01 01:00:45","tempreature":"25.90","id":"10"},
{"timestamp":"2016-08-01 01:00:47","tempreature":"26.40","id":"11"},
{"timestamp":"2016-08-01 01:00:50","tempreature":"28.39","id":"12"},
{"timestamp":"2016-08-01 01:01:01","tempreature":"23.33","id":"13"},
{"timestamp":"2016-08-01 01:01:02","tempreature":"20.23","id":"14"},
{"timestamp":"2016-08-01 01:01:04","tempreature":"20.93","id":"15"},
{"timestamp":"2016-08-01 01:01:06","tempreature":"22.1","id":"16"},
{"timestamp":"2016-08-01 01:01:08","tempreature":"25.90","id":"17"},
{"timestamp":"2016-08-01 01:01:10","tempreature":"28.30","id":"18"},
{"timestamp":"2016-08-01 01:01:12","tempreature":"22.30","id":"19"},
{"timestamp":"2016-08-01 01:01:14","tempreature":"29.20","id":"20"},
{"timestamp":"2016-08-01 01:01:18","tempreature":"19.40","id":"21"},
{"timestamp":"2016-08-01 01:01:19","tempreature":"23.10","id":"22"},
{"timestamp":"2016-08-01 01:01:20","tempreature":"26.40","id":"23"},
{"timestamp":"2016-08-01 01:01:21","tempreature":"23.30","id":"24"},
{"timestamp":"2016-08-01 01:01:25","tempreature":"20.33","id":"25"},
{"timestamp":"2016-08-01 01:01:29","tempreature":"33.00","id":"26"},
{"timestamp":"2016-08-01 01:02:01","tempreature":"30.1","id":"27"},
{"timestamp":"2016-08-01 01:02:05","tempreature":"27.90","id":"28"},
{"timestamp":"2016-08-01 01:02:11","tempreature":"19.99","id":"29"},
{"timestamp":"2016-08-01 01:02:18","tempreature":"25.40","id":"30"},
{"timestamp":"2016-08-01 01:02:25","tempreature":"28.09","id":"31"},
{"timestamp":"2016-08-01 01:02:30","tempreature":"23.10","id":"32"},
{"timestamp":"2016-08-01 01:02:33","tempreature":"19.04","id":"33"},
{"timestamp":"2016-08-01 01:02:40","tempreature":"30.03","id":"34"},
{"timestamp":"2016-08-01 01:02:41","tempreature":"29.10","id":"35"},
{"timestamp":"2016-08-01 01:02:43","tempreature":"33.01","id":"36"},
{"timestamp":"2016-08-01 01:02:45","tempreature":"31.90","id":"37"},
{"timestamp":"2016-08-01 01:02:50","tempreature":"32.02","id":"38"},
{"timestamp":"2016-08-01 01:03:02","tempreature":"34.10","id":"39"},
{"timestamp":"2016-08-01 01:03:05","tempreature":"34.20","id":"40"},
{"timestamp":"2016-08-01 01:03:07","tempreature":"34.88","id":"41"},
{"timestamp":"2016-08-01 01:03:09","tempreature":"38.90","id":"42"},
{"timestamp":"2016-08-01 01:03:11","tempreature":"39.90","id":"43"},
{"timestamp":"2016-08-01 01:03:12","tempreature":"38.22","id":"44"},
{"timestamp":"2016-08-01 01:03:15","tempreature":"33.02","id":"45"},
{"timestamp":"2016-08-01 01:03:18","tempreature":"34.08","id":"46"},
{"timestamp":"2016-08-01 01:03:21","tempreature":"34.05","id":"47"},
{"timestamp":"2016-08-01 01:03:29","tempreature":"30.02","id":"48"},
{"timestamp":"2016-08-01 01:03:33","tempreature":"28.04","id":"49"},
{"timestamp":"2016-08-01 01:03:35","tempreature":"25.06","id":"50"},
{"timestamp":"2016-08-01 01:03:38","tempreature":"29.10","id":"51"},
{"timestamp":"2016-08-01 01:04:01","tempreature":"30.01","id":"52"},
{"timestamp":"2016-08-01 01:04:05","tempreature":"33.10","id":"53"},
{"timestamp":"2016-08-01 01:04:12","tempreature":"34.20","id":"54"}

results matching ""

    No results matching ""