简介:
Hadoop分布式文件系统(HDFS)被设计成适合运行在通用硬件上的分布式文件系统。它和现有的分布式文件系统有很多共同点。但同时,它和其他的分布式文件系统的区别也是很明显的。HDFS是一个高度容错性的系统,适合部署在廉价的机器上。HDFS能提供高吞吐量的数据访问,非常适合大规模数据集上的应用。HDFS放宽了一部分POSIX约束,来实现流式读取文件系统数据的目的。
版本:
当前订阅的HDFS服务的版本:1.0
订阅:
1 参考寄云NeuSeer平台使用手册完成HDFS服务的订阅,订阅过程中,可以输入HDFS服务名称。
2 成功订阅后,在服务详情中将呈现用户名、密码、服务网关、用户目录、HDFS访问地址等信息。
数据文件上传:
目前版本主要通过两种途径完成数据文件的上传操作:curl命令行方式和Postman API方式。
curl命令行方式:
1 通过curl命令在HDFS上产生上传数据文件的位置,curl命令组成如下:
curl -i -k -u 用户名:用户密码 -X PUT ‘服务网关+用户目录+文件名称+ ?op=CREATE’
命令示例:
用户名:u_dfs_Ng061Fpu
密码:****************************************
服务网关:http://54.223.242.107:8443/gateway/default/webhdfs/v1
用户目录:/user/u_dfs_Ng061Fpu
上传文件名称:test.json
curl -i -k -u u_dfs_Ng061Fpu:*************************** -X PUT 'http://54.223.242.107:8443/gateway/default/webhdfs/v1/user/u_dfs_Ng061Fpu/test.json?op=CREATE'
2 使用curl命令上传文件,curl命令组成如下:
curl -i -k -u 用户名:用户密码 -T 文件名称 '上一步命令反馈的Location地址'
命令示例:
curl -i -k -u u_dfs_Ng061Fpu:******************************* -T test.json 'http://54.223.242.107:8443/gateway/default/webhdfs/data/v1/webhdfs/v1/user/u_dfs_Ng061Fpu/test.json?_=AAAACAAAABAAAAFQrGlO1mf6enoIvOgeG9B-0tH4zj7X2qFSg72hDDMZDjDB24mSyJOKUkcj-6Ig-YmKjtkdlehmQJpfCEU6W9eWa7zeQnQTp_4vXJjbtpDwKMknOHyF8OUiAX3wLbTMRMAoZbKVeuVZAaDdIXPaSQUf2A-EYYzrnQuAkPzvPjTNEn5pTfgPEEdS3-AJkGbVqz9RLciTPftisZXsHjumPYU7Y4db0DXTcagisvONk864wjClpJqWxtyP9jzooBo4ZLjOb8BUFGmXHpz4GSoyfs7udZgBIHQcwpDg_xLmmYDxJZeokWWgVEh8QIFPQwHzFqbixn-LtqvAz_wlzFuh1yIbkNibLy8kRhbY-fWPI5VTRXEh6nki1LnntSyThSOM5mjydhRWJJm1-7a7Z4gOeaIeI8N8hYehYbdGqfawujqzQDFkV3fF_GbLYgUM4UQTGikl53gyWQJQywaV-Azk70U42ORGIzsw6ZH_'
3 如果需要通过Spark或其它方法对这些数据进行访问及处理,需要修改上传的数据文件的访问权限,以便Spark能够操作该数据文件,curl命令组成如下:
curl -i -u 用户名:用户密码 -X PUT '服务网关+用户目录+文件名称?op=SETACL&aclspec=user::rwx,group::r-x,other::r-x'
如果需要保护自己的数据文件,则可以设置为:
curl -i -u 用户名:用户密码 -X PUT '服务网关+用户目录+文件名称?op=SETACL&aclspec=user::rwx,group::---,other::---,user:订阅的Spark服务中的用户名:r-x'
命令示例:
curl -i -u u_dfs_Ng061Fpu:************************ -X PUT 'http://54.223.242.107:8443/gateway/default/webhdfs/v1/user/u_dfs_Ng061Fpu/test.json?op=SETACL&aclspec=user::rwx,group::r-x,other::r-x'
如果需要保护自己的数据文件,则命令示例:
订阅的Spark服务中的用户名:u_spk_7IOwdftm
curl -i -u u_dfs_Ng061Fpu:************************* -X PUT 'http://54.223.242.107:8443/gateway/default/webhdfs/v1/user/u_dfs_Ng061Fpu/test.json?op=SETACL&aclspec=user::rwx,group::---,other::---,user:u_spk_7IOwdftm:r-x'
同时还要修改用户目录的访问权限(如果在用户目录下自定义了子目录,也要修改),curl命令组成如下:
curl -i -u 用户名:用户密码 -X PUT '服务网关+用户目录?op=SETACL&aclspec=user::rwx,group::r-x,other::r-x'
如果需要保护自己的数据文件目录,则可以设置为:
curl -i -u 用户名:用户密码 -X PUT '服务网关+用户目录?op=SETACL&aclspec=user::rwx,group::---,other::---,user:订阅的Spark服务中的用户名:r-x'
命令示例:
curl -i -u u_dfs_Ng061Fpu:*************************** -X PUT 'http://54.223.242.107:8443/gateway/default/webhdfs/v1/user/u_dfs_Ng061Fpu?op=SETACL&aclspec=user::rwx,group::r-x,other::r-x'
如果需要保护自己的数据文件目录,则命令示例:
curl -i -u u_dfs_Ng061Fpu:**************************** -X PUT 'http://54.223.242.107:8443/gateway/default/webhdfs/v1/user/u_dfs_Ng061Fpu?op=SETACL&aclspec=user::rwx,group::---,other::---,user:u_spk_7IOwdftm:r-x'
Postman API方式:
注:需要在Chrome浏览器中安装Postman应用和postman Interceptor插件,并开启postman Interceptor插件。并在postman设置的settings中关闭Automatically follow redirects。
1 选择PUT操作,URL构成为:
服务网关+用户目录+文件名称+ ?op=CREATE
示例:
用户名:u_dfs_Ng061Fpu
密码:**************************************
服务网关:http://54.223.242.107:8443/gateway/default/webhdfs/v1
用户目录:/user/u_dfs_Ng061Fpu
上传文件名称:test.json
http://54.223.242.107:8443/gateway/default/webhdfs/v1/user/u_dfs_Ng061Fpu/data/test.json?op=CREATE
然后,选择基本认证(Basic Auth)选项卡,在用户名和密码处输入用户名及密码后,点击“Update Request”按钮。
在“Body“选项卡下选择“binary”后,通过点击“选择文件”选中要上传的文件。
2 点击“Send”按钮,完成文件的上传。如果需要通过Spark或其它方法对这些数据进行访问及处理,需要修改上传的数据文件及相关文件夹的访问权限,以便Spark能够操作该数据文件,选择PUT操作,URL构成为:
服务网关+用户目录+文件名称?op=SETACL&aclspec=user::rwx,group::r-x,other::r-x
如果需要保护自己的数据文件,则可以设置为:
服务网关+用户目录?op=SETACL&aclspec=user::rwx,group::---,other::---,user:订阅的Spark服务中的用户名:r-x
示例:
如果需要保护自己的数据文件,则示例为:
3 点击“Send”按钮完成权限修订,同时还要修改用户目录的访问权限(如果在用户目录下自定义了子目录,也要修改)。
Python客户端方式:
1 安装配置Python客户端(以下以在Ubuntu16为示例安装webhdfs-client)
运行以下命令,完成安装操作:
pip install git+https://github.com/luff/webhdfs-client.git
在用户对应的home目录下输入.whdfsc.json,文件内容如下:
{
"insecure": false,
"username": "your-webhdfs-user",
"password": "your-webhdfs-user-pw",
"rest_api": "https://your-webhdfs-gateway/webhdfs/v1"
}
将文件中your-webhdfs-user、your-webhdfs-user-pw、https://your-webhdfs-gateway/webhdfs/v1修改为对应的内容,如下图所示:
2 使用whdfsc命令完成文件上传操作,whdfsc命令格式为:
whdfsc put 本地文件目录/文件名称 HDFS用户目录/(文件目录/)文件名称
whdfsc put /tmp/test.json /user/u_dfs_jhzv0sxm/test.json
使用以下whdfsc命令可以完成目录和文件权限的设置:
文件权限设置:
whdfsc chmod -p 权限 HDFS用户目录/(文件目录/)文件名称
whdfsc chmod -p 755 /user/u_dfs_jhzv0sxm/test.json
目录权限的设置:
whdfsc chmod -p 权限 HDFS用户目录/(文件目录/)
whdfsc chmod -p 755 /user/u_dfs_jhzv0sxm
使用Spark服务处理数据:
1 参考寄云NeuSeer平台使用手册完成Spark服务的订阅,并进入管理服务界面。
2 在Zeppelin界面上输入以下代码:
import scala.util.parsing.json.JSON
import org.apache.hadoop.fs.{FileSystem, FileUtil, Path}
//import org.apache.hadoop.gateway.shell.Hadoop
//import org.apache.hadoop.gateway.shell.hdfs.Hdfs
import cn.neucloud.dasuan.analysis.timeseries.Sessionize
import cn.neucloud.dasuan.analysis.stat.BaseStatistic
import cn.neucloud.dasuan.SparkContextImpl
import org.apache.spark.sql.SQLContext;
import cn.neucloud.dasuan.analysis.timeseries.Sessionize
import cn.neucloud.dasuan.analysis.timeseries.sparkts._
import cn.neucloud.dasuan.utils.DFTools
val sc = SparkContextImpl("test", true)
val sqlContext = new SQLContext(sc)
val ses = new Sessionize()
val dt=DFTools.fromJson("HDFS访问地址+用户目录+文件名称",sc)
val dataFrame = ses.byInterval(sc, dt, 3000L, 2L, 20L)
dataFrame.printSchema();
dataFrame.show();
3 运行后,即可完成对数据进行3000毫秒间隔分组的处理。
test.json文件如下:
{"timestamp":"2016-08-01 01:00:01","tempreature":"20.12","id":"01"},
{"timestamp":"2016-08-01 01:00:02","tempreature":"21.23","id":"02"},
{"timestamp":"2016-08-01 01:00:06","tempreature":"24.45","id":"03"},
{"timestamp":"2016-08-01 01:00:07","tempreature":"27.05","id":"04"},
{"timestamp":"2016-08-01 01:00:09","tempreature":"27.10","id":"05"},
{"timestamp":"2016-08-01 01:00:11","tempreature":"23.18","id":"06"},
{"timestamp":"2016-08-01 01:00:12","tempreature":"29.10","id":"07"},
{"timestamp":"2016-08-01 01:00:34","tempreature":"30.34","id":"08"},
{"timestamp":"2016-08-01 01:00:44","tempreature":"20.10","id":"09"},
{"timestamp":"2016-08-01 01:00:45","tempreature":"25.90","id":"10"},
{"timestamp":"2016-08-01 01:00:47","tempreature":"26.40","id":"11"},
{"timestamp":"2016-08-01 01:00:50","tempreature":"28.39","id":"12"},
{"timestamp":"2016-08-01 01:01:01","tempreature":"23.33","id":"13"},
{"timestamp":"2016-08-01 01:01:02","tempreature":"20.23","id":"14"},
{"timestamp":"2016-08-01 01:01:04","tempreature":"20.93","id":"15"},
{"timestamp":"2016-08-01 01:01:06","tempreature":"22.1","id":"16"},
{"timestamp":"2016-08-01 01:01:08","tempreature":"25.90","id":"17"},
{"timestamp":"2016-08-01 01:01:10","tempreature":"28.30","id":"18"},
{"timestamp":"2016-08-01 01:01:12","tempreature":"22.30","id":"19"},
{"timestamp":"2016-08-01 01:01:14","tempreature":"29.20","id":"20"},
{"timestamp":"2016-08-01 01:01:18","tempreature":"19.40","id":"21"},
{"timestamp":"2016-08-01 01:01:19","tempreature":"23.10","id":"22"},
{"timestamp":"2016-08-01 01:01:20","tempreature":"26.40","id":"23"},
{"timestamp":"2016-08-01 01:01:21","tempreature":"23.30","id":"24"},
{"timestamp":"2016-08-01 01:01:25","tempreature":"20.33","id":"25"},
{"timestamp":"2016-08-01 01:01:29","tempreature":"33.00","id":"26"},
{"timestamp":"2016-08-01 01:02:01","tempreature":"30.1","id":"27"},
{"timestamp":"2016-08-01 01:02:05","tempreature":"27.90","id":"28"},
{"timestamp":"2016-08-01 01:02:11","tempreature":"19.99","id":"29"},
{"timestamp":"2016-08-01 01:02:18","tempreature":"25.40","id":"30"},
{"timestamp":"2016-08-01 01:02:25","tempreature":"28.09","id":"31"},
{"timestamp":"2016-08-01 01:02:30","tempreature":"23.10","id":"32"},
{"timestamp":"2016-08-01 01:02:33","tempreature":"19.04","id":"33"},
{"timestamp":"2016-08-01 01:02:40","tempreature":"30.03","id":"34"},
{"timestamp":"2016-08-01 01:02:41","tempreature":"29.10","id":"35"},
{"timestamp":"2016-08-01 01:02:43","tempreature":"33.01","id":"36"},
{"timestamp":"2016-08-01 01:02:45","tempreature":"31.90","id":"37"},
{"timestamp":"2016-08-01 01:02:50","tempreature":"32.02","id":"38"},
{"timestamp":"2016-08-01 01:03:02","tempreature":"34.10","id":"39"},
{"timestamp":"2016-08-01 01:03:05","tempreature":"34.20","id":"40"},
{"timestamp":"2016-08-01 01:03:07","tempreature":"34.88","id":"41"},
{"timestamp":"2016-08-01 01:03:09","tempreature":"38.90","id":"42"},
{"timestamp":"2016-08-01 01:03:11","tempreature":"39.90","id":"43"},
{"timestamp":"2016-08-01 01:03:12","tempreature":"38.22","id":"44"},
{"timestamp":"2016-08-01 01:03:15","tempreature":"33.02","id":"45"},
{"timestamp":"2016-08-01 01:03:18","tempreature":"34.08","id":"46"},
{"timestamp":"2016-08-01 01:03:21","tempreature":"34.05","id":"47"},
{"timestamp":"2016-08-01 01:03:29","tempreature":"30.02","id":"48"},
{"timestamp":"2016-08-01 01:03:33","tempreature":"28.04","id":"49"},
{"timestamp":"2016-08-01 01:03:35","tempreature":"25.06","id":"50"},
{"timestamp":"2016-08-01 01:03:38","tempreature":"29.10","id":"51"},
{"timestamp":"2016-08-01 01:04:01","tempreature":"30.01","id":"52"},
{"timestamp":"2016-08-01 01:04:05","tempreature":"33.10","id":"53"},
{"timestamp":"2016-08-01 01:04:12","tempreature":"34.20","id":"54"}