一、安装Spark
检查基础环境hadoop,jdk
配置文件
试运行Python代码
二、Python编程练习:英文文本的词频统计
准备文本文件:heal-the-world.txt
点击查看代码
There\'s a place in your heart
And I know that it is love
And this place could be much brighter than tomorrow
And if you really try
You\'ll find there\'s no need to cry
In this place you\'ll feel
There\'s no hurt or sorrow
There are ways to get there
If you care enough for the living
Make a little space
Make a better place
Heal the world
Make it a better place
For you and for me
And the entire human race
There are people dying
If you care enough for the living
Make it a better place
For you and for me
If you want to know why
There\'s a love that cannot lie
Love is strong
It only cares for joyful giving
If we try we shall see
In this bliss we cannot feel
Fear or dread
We stop existing and start living
Then it feels that always
Love\'s enough for us growing
Make a better world
Make a better world
Heal the world
Make it a better place
For you and for me
And the entire human race
There are people dying
If you care enough for the living
Make a better place for you and for me
And the dream we were conceived in
Will reveal a joyful face
And the world we once believed in
Will shine again in grace
Then why do we keep strangling life
Wound this earth, crucify its soul
Though it\'s plain to see
This world is heavenly be god\'s glow
We could fly so high
Let our spirits never die
In my heart I feel you are all my brothers
Create a world with no fear
Together we’ll cry happy tears
We see the nations turn their swords into plowshares
We could really get there
If you cared enough for the living
Make a little space
To make a better place
Heal the world
Make it a better place
For you and for me
And the entire human race
There are people dying
If you care enough for the living
Make a better place for you and for me
Heal the world
Make it a better place
For you and for me
And the entire human race
There are people dying
If you care enough for the living
Make a better place for you and for me
Heal the world
Make it a better place
For you and for me
And the entire human race
There are people dying
If you care enough for the living
Make a better place for you and for me
There are people dying
If you care enough for the living
Make a better place for you and for me
There are people dying
If you care enough for the living
Make a better place for you and for me
You and for me
You and for me
You and for me
You and for me
读文件,预处理:大小写,标点符号,停用词,分词 main.py
点击查看代码
with open(\"Under the Red Dragon.txt\", \"r\") as f:
text=f.read()
text = text.lower()
for ch in \'!@#$%^&*(_)-+=\\\\[]}{|;:\\\'\\\"`~,<.>?/\':
text=text.replace(ch,\" \")
words = text.split() # 以空格分割文本
stop_words = []
with open(\'stop_words.txt\',\'r\') as f: # 读取停用词文件
for line in f:
stop_words.append(line.strip(\'\\n\'))
afterwords=[]
for i in range(len(words)):
z=1
for j in range(len(stop_words)):
if words[i]==stop_words[j]:
continue
else:
if z==len(stop_words):
afterwords.append(words[i])
break
z=z+1
continue
统计每个单词出现的次数,按词频大小排序,结果写文件 main.py
点击查看代码
counts = {}
for word in afterwords:
counts[word] = counts.get(word, 0) + 1
items = list(counts.items())
items.sort(key=lambda x: x[1], reverse=True)
f1 = open(\'count.txt\', \'w\')
for i in range(len(items)):
word, count = items[i]
f1.write(word+\" \"+str(count)+\"\\n\")
输出结果
点击查看代码
a 22
make 18
place 17
living 10
world 10
care 8
people 7
dying 7
s 7
human 5
heal 5
entire 5
race 5
love 4
feel 3
heart 2
fear 2
space 2
ll 2
i 2
cry 2
joyful 2
crucify 1
create 1
we’ll 1
existing 1
high 1
fly 1
earth 1
face 1
find 1
turn 1
nations 1
spirits 1
ways 1
god 1
swords 1
wound 1
start 1
tomorrow 1
cared 1
brighter 1
tears 1
bliss 1
heavenly 1
glow 1
sorrow 1
reveal 1
plowshares 1
shine 1
life 1
brothers 1
lie 1
conceived 1
stop 1
hurt 1
believed 1
feels 1
strangling 1
strong 1
grace 1
plain 1
soul 1
cares 1
dread 1
happy 1
die 1
growing 1
giving 1
dream 1
三、使用PyCharm搭建编程环境:Ubuntu 16.04 + PyCharm + spark
来源:https://www.cnblogs.com/coder-one/p/15972584.html
本站部分图文来源于网络,如有侵权请联系删除。