Saturday, September 5, 2009

XML parsing in Erlang

XML handling in Erlang is too hard for me. So I've made some survey first.

Characteristics


XML handling operation has two major phases.

  • Parse XML document and make data(tree of elements)
    • xmerl_scan (whole element tree is processed at once)
    • SAX type parser
      • xmerl_eventp
      • erlsom_sax (developed 3rd party. not bundled with Erlang OTP)

  • Access (traverse) elements within data
    • XPATH (xmerl_xpath)
    • XSLT (xmerl_xs)
    • callback (hook) function from SAX type parser
    • hand-made logic
      • traverse tree
      • extract tuple from element tree(list) by 'list comprehension' technique

So, methodology of XML parsing is characterized by Parsing and Access method

matrix of each method


parse method
Acceess method
samples on the Web
by
xmerl_scan
xmerl_xpath
Parsing Atom with ErlangSam Ruby
xmerl_scan
xmerl_xpath (with useful MACRO)
XML processing in Erlang
Torbjörn Törnkvist
xmerl_scan
hand-made (traverse tree by lists:foldl)
Return Erlang Data from XML
Muharem Hrnjadovic
xmerl_scan
hand-made (use list comprehension)
XML processing in Erlang
Hakan Mattson
xmerl_eventp
callback(hook) function
XML processing in Erlang
Torbjörn Törnkvist
erlsom_sax
callback function
XML processing in Erlang
Willem de Jong


Operation example


example-1 : emerl_scan + xpath


If you know which elements you need exactly, and source XML file is not so huge, parse by xmerl_scan, access by xmerl_xpath.
Note: As of Erlang/OTP R13B01 supports XPATH 1.0

inspired by Torbjörn Törnkvist's code.

sample xml data ("e.xml")

<Envelope>
<Title>envelope title</Title>
<InnerEnv>
<IDNUM>403276</IDNUM>
<ItemName>Name String</ItemName>
<Pages>0</Pages>
</InnerEnv>
</Envelope>

code


-module(example1).
-export([doit/1]).
-include_lib("xmerl/include/xmerl.hrl").

-define(Val(X),
(fun() ->
[#xmlElement{name = N,
content = [#xmlText{value = V}|_]}] = X,
{N,V} end)())
.

doit(File) ->
{Xml, _} = xmerl_scan:File(File),
[
?Val(xmerl_xpath:string("/Envelope/Title", Xml)),
?Val(xmerl_xpath:string("//IDNUM", Xml)),
?Val(xmerl_xpath:string("//ItemName", Xml)),
?Val(xmerl_xpath:string("//Pages", Xml))
]
.

results

1> example1:go("e.xml").
[
{'Title',"envelope title"},

{'IDNUM',"403276"},
{'ItemName',"Name String"},
{'Pages',"0"}]

example-2 : xmerl_scan + traverse element tree by lists:foldl


if you want to translate whole XML data into other scheme, you need to traverse whole tree by lists:foldl function.

inspired by Muharem Hrnjadovic's code

sample xml data ("e.xml")

<Envelope>
<Title>envelope title</Title>
<InnerEnv>
<IDNUM>403276</IDNUM>
<ItemName>Name String</ItemName>
<Pages>0</Pages>
</InnerEnv>
</Envelope>

code

-module(example2).
-export([go/1]).
-include_lib("xmerl/include/xmerl.hrl").

go(File) ->
{R, _} = xmerl_scan:file(File),
io:format("~p~n",[lists:reverse(traverse(R, []))])
.
traverse(R, L) when is_record(R, xmlElement) ->
lists:foldl(fun traverse/2, L, R#xmlElement.content) ;
traverse(#xmlText{parents=[{'Title',_},_], value=V}, L) -> [{title, V}|L];
traverse(#xmlText{parents=[{'IDNUM',_},_,_], value=V}, L) ->
[{idnum, V}|L];
traverse(#xmlText{parents=[{'ItemName',_},_,_], value=V}, L) ->
[{itemname, V}|L];
traverse(#xmlText{parents=[{'Pages',_},_,_], value=V}, L) ->
[{pages, V}|L];

traverse(_R, L) ->
L
.


results

2> example2:go("e.xml").
[{title,"envelope title"},

{idnum,"403276"},

{itemname,"Name String"},

{pages,"0"}]


example-3 : SAX type parsing operation


if the computation resouce is limited, whole XML data cannot be processed at once. So, SAX type parser callback functions to process just parsed element.

As SAX type parser, erlsom_sax is welknown. And Willem de Jong posted his code based on erlsom_sax.
see http://blog.tornkvist.org/blog.yaws?id=1193209275268448

here is the code based on xmerl_sax_parser library which is inspired by above code.

sample xml data ("e.xml")



<Envelope>

<Title>envelope title</Title>
<InnerEnv>
<IDNUM>403276</IDNUM>
<ItemName>Name String</ItemName>
<Pages>0</Pages>
</InnerEnv>
</Envelope>


code

-module(example3).
-export([go/1]).

go(File) ->

Option = [
{event_fun, fun eventfun/3}, {event_state, {[], []}} ], case xmerl_sax_parser:file(File, Option) of
 {ok,{Stack, Acc}} -> lists:reverse(Acc);
{Other} -> Other
end .

eventfun({ignorableWhitespace, _}, _, State) ->

State
;
eventfun({startElement, _, Tag, _, _}, _Location, {Stack, Acc}) -> {[Tag | Stack], Acc} ;
eventfun({characters, Value}, _Location, {[Tag | _L] = Stack, Acc}) -> {Stack, [{Tag, Value} | Acc]} ;
eventfun({endElement, _, _, _}, _Location, {[_ | L], Acc}) ->
{L, Acc} ;
eventfun(_,_,State) ->
State .



results


6> example3:go("e.xml").
[{"Title","envelope title"},
{"IDNUM","403276"},

{"ItemName","Name String"},

{"Pages","0"}]

Wednesday, August 26, 2009

Insider's portal - why we need it

I've just made a presentation.

As a system(or network or everything) admin, I thought that user literacy is the key to administration.

(Disaster-)Preventive education (to get LITERACY) is more important than incident response. (Of course incident response is important. But preparation of it is more important too.)

But, how?

Here is my plan. Make an attractive portal site for insiders of your company.

In the site,
administrators can hide some tricks to kick them to be educated.
both user and administrators can get immediate response.
user could know who is in the server, and hopefully be friend.
we can centralize resource.
we can simplify client-environment enough to keep them clean.

anyway, if you were interested in, please click my presentation.

Friday, August 21, 2009

feedchat - chat in valid RSS format

This is inspired by the sample code of chat server from "Programming Erlang" by Joe Armstrong

Chat Server -> as the sample is
feed view -> tail -N of feed history
           or,   tail -f -N =:= friendfeed

chat server process : implemented by CouchDB
    RSS entry = document
    DB = history (w/ time stamp)

this is cheapest implementation of friendfeed


Map-Reduce (query against CouchDB) view point
    Map : crunch feed via XMPP or Pubsubhubbub...

    Reduce: tail -N of (feed history)
    Reduce: tail -f -N of (feed history)


Wednesday, August 12, 2009

Quest for Quality Without A Name

from the aftermath of FriendFeed acquisition

This entry is translated from my Japanese blog(diary) entry やや感傷的なex-日記オヤジの繰言 Maybe too long, and pointless, sorry. But, it is required to describe my thought exactly (diary is used for).

--------------------------------------------------------------------------

My first impression on the news of friendfeed acquisition by facebook was, "Congrats, Good Exit".

Because I know a bit about exhausting life in start-ups. I've been a programmer of venture software company just for a half a year. But a half a year is enough for me. Money burns down every second, heavy pressure, too tight schedule, anxiety about future... Venture company almost always stuck in the mud of shortage of money to run. For both growing, shrinking, too little money to survive. Even if you have growing/promissing business, it's too hard to maintain cash flow to catch up growth.

So, every venture company would be broken in some sense.
Change itself from slim/beauty deer to fat cash-cow, or, just vanish.
Anyway, stop running is a relief. I thought. -- I had a point of view from inside of start-ups.

But, soon I noticed the "not-welcome" comments from USERs. I don't know claiming users are majority or not, but they are impressive and they are like a flood of voices.
(I just remembered the article on realtime-riot on friendfeed)
If you saw that in real-time-line, it would be overwhelming.

I just wondered, were "Friendfeed is coolest service, but NO ONE USE" true?  But, Look, here so much people crying for help and showing disbelief within friendfeed!

It's not only because coolest toy taken away. I guess. They are serious about something more important. What's that?


So, here is my thought.


Friendfeed is open-architecture. Not only open, they welcome to questions, requirements, and ready to offer rich API. Only a few guys challenged raw API hacking, but even average users (if active), enjoyed making groups, special feeds based on rich API. Little tweaks they made, would become little treasure of themselvs. So, they invest their small hacks into the friendfeed, and made virtual money(value). Like a sort of SECOND-LIFE without 3D avator, and its value is not money, but rich communication/fun.



Once dived into friendfeed and soaked, then you can find out people just like yourself. Even if you are minority in real life, here you have friends. This is common phonomena when the network shrink size of the world. (I've experienced some times. When I made my homepage(Diary) around 1996, its content was very nitch/minor one,, my daily life... But soon, I could found out people like me!)



Thus, in friendfeed, communication made community, and flexible system enforced eco-system of value-creation among users.



But, no one(very few, maybe), recognized that. Anyway, such a recognition itself would help too little in value-creation, might be :-p


Then, the crisis on the basis of community enlightened people. They noticed the existence of community, value of community, and realizing how the system enforced the process of creation.



Sooner or later, system itself would be bored by people, then people abandon the system. Thus, divorce of system/user is inevitable.

So, I'm asking to myself. What happens next? What's the next friendfeed? I don't know. But one thing is certain.
People wouldn't vanish.
They will make another community in the long run.
Linkage among people also survive for long time.
Then, next edge-community may looks drastically changed, but the same face you can find out, I think.
Because, My experience assures.

In Japan, on Sep 15, 1995, a meeting held which named "Second Summer of Web", it's a milestone event in internet users and Web-diary("NIKKI in Japanse, ancient hand-made blog) community in Japan. Since then about 15years, now I can see some in the front of network community "BENKYO-KAI" in Japan.


So. Folks, once you recognized the situation, please be calm down. Still, you think you need to take action, go ahead.
Then, see you again, in another space, another time, at edge of the net.

-------------------------------------------------------------------------------------

Addendum


Witing nn community and creation, I remembered the name Christopher Alexander. And his book on Design Pattern. "A Pattern Language: Towns, Buildings, Construction". In that book, key phrase "Quality Without A Name(QWAN)" also remembered.
Now I don't have enough time to explain it. But it would be interesting to what corresponds to QWAN in friendfeed system/community, I think.

Saturday, August 8, 2009

FriendSync service design

Now, FriendFeed is the best twitter client. But there is a missing-link.
Who use twitter, hostiled their follower, are nailed down to twitter.
If you can maintain follows automatically on FriendFeed, you'll have Smarter-Twitter.

So, I imagined follower-synchronize service. FriendSync.
To prove inconvenience FriendFeed user who has so much Twitter followings.

But! oH My gOd! FriendSync already exists (2009 Aug 7) ... so I need to change name to Merge2it2ff or Invi2ff
FriendSync - Sync w/ Facebook
http://appup.net/item/detail/288348190


FriendSync syncs your existing iPhone Contacts with Facebook!Syncs your
iPhone's contact pictures, names and birthdays with Facebook.


Anyway, here is a memo on security key features.

Key feature : use OAuth (3-legged!)

To invite followers from Twitter to FriendFeed, the service need to have authorized to access follow information and register follower.
For such a process, OAuth is desined.
And, OAuth's "easy-to-use" is already proved by Mobster World :)

Service programs can be located on VPS, but secure info (Request Key) is encrypted|digested, and shall be stored in more secure place.
So, service system deployment starts from SSL key signing.
Every step needs SSL communication between VPS (application server) and secure storage server.
If you using trustfull (using all by yourself, and all well-known ports are protected neatly) server, both application and secure storage server can be the same one :-)
This would be low performance, slow in latency, but this is not so much frequently used application that it has no problem.

Key feature : Informed concent before OAuth steps.


It's partly because of complexity of registration step. You want to be informed enough to keep motivation during long way.

Usecase
  • registration
    • Inform the risk
    • Inform how to reboke
    • force to check reboke page
    • guide how to use
      • add/remove follow from this application
      • automatic sync follow status on every 30 minutes
      • manually sync follow status
    • guide registration step (2 step)
    • start registration
      • declear that your id is the same as FriendFeed
  • login
    • use FriendFeed OAuth (OpenID of twitter might be straight forward to explain, :-)
    • need cookie (expires shortly,,,, mmm,


Friday, June 26, 2009

If you love to code in Erlang and read Japanese..

Please watch out http://ja.doukaku.org/lang/erlang/unsolved/
There would be many challenges to solve in Erlang programming language.

(added: In English, you can try to fill vacancies in http://pleac.sourceforge.net/pleac_erlang/index.html)

Friday, June 5, 2009

Hatena Bookmark is killer twitter? No. it's beyond.

here is just my humble opinion...

Japanese visionary, Mochio Umeda recently made controversy comment in his interview.
He said 'So sorry for Japanese-internet-scene. Now, just a underdog'. (this is not precice translation, but most of reader took it as such.)
He is a vice president or such a position in Hatena(hatena.ne.jp), so, who claimed about his statements, blamed Hatena too.

I have no spare time to blame anybody. But I have time to check the status quo.

Hatene has a lot of laboratory-level ideas. And some of thier major services are very nice and convenient, and "edge", I think.

Twitter is very simple service. Some one tweets, then others can know that.
Hatena-Bookmark(HATEBU) sevice is more complecated(maybe hard to use), but more powerful.
Hatebu user can tweet anywhere (contents) in the internet. Then, any "contents" in the internet can cite tweets as comments if they want. HATEBU API is open to contents owner. Of course, you can follow anyone, as twitter offers.
Hatebu introduces additional dimension in stream-media. If Hatena sold its armors just now, $10B would be cheap for some VCs. Hatebu may kill twitter.

In fact, its API is TOO powerful for someone who cares about bad-repuation or silly comments. But, it is easy to mask out such a noice. Just remove embeded script.

So, I'm so sorry for Hatena who does not sell their ideas at right time....

Friday, May 22, 2009

Status update : still alive. has more TODOs

These days, I have less time to blog here, for, f2p (http://dev.ctor.org/f2p/ ) has too much fun.

But, no plan to abandon this blog.
Here, some of my task lists to come.
  • SELinux user guide translation into Japanese (for my study)
  • build Small guest-OS for Xen hypervisor (for my study)
  • try to design helper-tool for Genetic-evolutional Game Framework in Erlang
  • Summarize filtered GEO-entries in FriendFeed, as plugin module for f2p

Monday, March 2, 2009

Httpd stress testbench in Erlang

just for training in Erlang programming, I've made some code for HTTP access test driver.

It reads existing access log of Apache(and its compatibles), and issues HTTP requests for designated target host.
It cannot,
  • issue POST, HEAD and other request than GET method
  • use cookies
  • wait for time intervals which are recorded in the log

To kick the test, you can follow process as below.


$ cd working_directory_which_has_logdir
$ erlc manager.erl
$ erlc logreader.erl
$ erlc client.erl
$
ls logdir
a.example.com
b.example.com
$ erl
-noshell -s manager start "logdir" 10000 20 3000 1 -s init stop

The last line is the command to initiate test.
erl is the name of Erlang interpretor.

-noshell option is to execute script in batch mode.

-s option (there are 2 blocks of them) designates each Erlang script.
first -s block means "manager:start("logdir", 10000, 20, 3000, 1)"
the first parameter specifies the place of logfiles.
second parameter 10000 specifies timeout in msec.
third parameter 20 specifies concurrency (how many clients issue request simultaneously).
4th parameter 3000 specifies interval in msec between each requests. This is required to save target host server to melt down :-P
5th parameter 1 is for KeepAlive flag.

and second -s block means "init:stop()" to exit batch Erlang operation.

I have confirmed that each client process can run simultaneously. Here is the sample result of tcpdump.

13:19:48.234946 IP 10.10.10.10.58500 > www.hoge.com.http: S 3505315617:3505315617(0) win 5840 <mss 1460,sackOK,timestamp 86795622 0,nop,wscale 3>
13:19:48.238005 IP 10.10.10.10.44873 > www.hoge.com.http: S 3499413688:3499413688(0) win 5840 <mss 1460,sackOK,timestamp 86795625 0,nop,wscale 3>
13:19:48.240203 IP 10.10.10.10.60753 > www.hoge.com.http: S 3510435421:3510435421(0) win 5840 <mss 1460,sackOK,timestamp 86795627 0,nop,wscale 3>
13:19:48.242465 IP 10.10.10.10.45900 > www.hoge.com.http: S 3500420864:3500420864(0) win 5840 <mss 1460,sackOK,timestamp 86795629 0,nop,wscale 3>
13:19:48.244827 IP 10.10.10.10.50124 > www.hoge.com.http: S 3495664924:3495664924(0) win 5840 <mss 1460,sackOK,timestamp 86795632 0,nop,wscale 3>
13:19:48.631375 IP www.hoge.com.http > 10.10.10.10.58500: S 320291303:320291303(0) ack 3505315618 win 64240 <mss 1460>
13:19:48.631504 IP 10.10.10.10.58500 > www.hoge.com.http: . ack 1 win 5840
13:19:48.675836 IP 10.10.10.10.58500 > www.hoge.com.http: P 1:108(107) ack 1 win 5840
13:19:48.769805 IP www.hoge.com.http > 10.10.10.10.58500: . ack 108 win 64240
13:19:48.769812 IP www.hoge.com.http > 10.10.10.10.44873: S 1269646951:1269646951(0) ack 3499413689 win 64240 <mss 1460>



You can access the code from http://github.com/kgbu/erlandom/tree/b661d83d128863667d0cf9fe86b2badb3416d4cd/httpstress

If you can read Japanese, my blog may help you.
http://d.hatena.ne.jp/kgbu/20080829/1219994676

Thursday, February 19, 2009

What is ffp?

In my previous entry, I've posted very messed-up procedure to install ffp(f2p).
But, don't blame ffp. Ffp(http://github.com/nahi/f2p/tree/master) itself is very smart and simple(thin-layered) application. I thank NaHi, author of this product. He said "ffp is for mobile-FriendFeed users".

FriendFeed is one of the best tools for re-blogging(twitting) with powerfull aggregation.
Ffp is used for the basis of UI-customization.

Now ffp offers you
  • simple/slim UI for "traditional-mobile phone web-browser" (although iPhone has its special UI from FriendFeed)
  • grouping messages/entries from various feed-sources into comment-tree structure
  • "recent-updated" view
  • re-share(cite/quote) messages from friend feed and manage it in your feed (So, your another friend may refer interesting post of friend-of-friend. Note: privacy matter might be cleared)
Still ffp seems to be in enhancement process. Feel free to drop in f2p room at FriendFeed.
http://friendfeed.com/rooms/f2p

Monday, February 16, 2009

how to setup your ffp

ffp takes you out from your desk with FriendFeed.



source: http://github.com/nahi/ffp/tree/master



In case of CentOS5.2 default installation, rubygems is not availabe.

So, installation starts at rubygem, but it requires rdoc... anyway, let's go further.

Setup framework



install rdoc

    # yum install rdoc

Dependencies Resolved

=============================================================================
 Package                 Arch       Version          Repository        Size
=============================================================================
Installing:
 ruby-rdoc               i386       1.8.5-5.el5_2.6  updates           136 k
Updating:
 ruby                    i386       1.8.5-5.el5_2.6  updates           280 k
 ruby-libs               i386       1.8.5-5.el5_2.6  updates           1.6 M
Installing for dependencies:
 ruby-irb                i386       1.8.5-5.el5_2.6  updates            69 k
Updating for dependencies:
 ruby-devel              i386       1.8.5-5.el5_2.6  updates           555 k







install gem

    # wget http://rubyforge.org/frs/download.php/45905/rubygems-1.3.1.tgz
    # tar zxf rubygems-1.3.1.tgz
    # cd rubygems-1.3.1
    # ruby setup.rb





install rails



    # gem install rails



Successfully installed rake-0.8.3

Successfully installed activesupport-2.2.2

Successfully installed activerecord-2.2.2

Successfully installed actionpack-2.2.2

Successfully installed actionmailer-2.2.2

Successfully installed activeresource-2.2.2

Successfully installed rails-2.2.2

7 gems installed

Installing ri documentation for rake-0.8.3...

Installing ri documentation for activesupport-2.2.2...

Installing ri documentation for activerecord-2.2.2...

Installing ri documentation for actionpack-2.2.2...

Installing ri documentation for actionmailer-2.2.2...

Installing ri documentation for activeresource-2.2.2...

Installing RDoc documentation for rake-0.8.3...

Installing RDoc documentation for activesupport-2.2.2...

Installing RDoc documentation for activerecord-2.2.2...

Installing RDoc documentation for actionpack-2.2.2...

Installing RDoc documentation for actionmailer-2.2.2...

Installing RDoc documentation for activeresource-2.2.2...



test rails



$ mkdir ffp
$ rails /home/ocao/ffp
      exists
      create  app/controllers
      create  app/helpers
      create  app/models
      create  app/views/layouts
      create  config/environments
      create  config/initializers
      create  config/locales
      create  db
      create  doc
      create  lib
      create  lib/tasks
      create  log
      create  public/images
      create  public/javascripts
      create  public/stylesheets
      create  script/performance
      create  script/process
      create  test/fixtures
      create  test/functional
      create  test/integration
      create  test/performance
      create  test/unit
      create  vendor
      create  vendor/plugins
      create  tmp/sessions
      create  tmp/sockets
      create  tmp/cache
      create  tmp/pids
      create  Rakefile
      create  README
      create  app/controllers/application.rb
      create  app/helpers/application_helper.rb
      create  test/test_helper.rb
      create  test/performance/browsing_test.rb
      create  config/database.yml
      create  config/routes.rb
      create  config/initializers/inflections.rb
      create  config/initializers/mime_types.rb
      create  config/initializers/new_rails_defaults.rb
      create  config/locales/en.yml
      create  config/boot.rb
      create  config/environment.rb
      create  config/environments/production.rb
      create  config/environments/development.rb
      create  config/environments/test.rb
      create  script/about
      create  script/console
      create  script/dbconsole
      create  script/destroy
      create  script/generate
      create  script/performance/benchmarker
      create  script/performance/profiler
      create  script/performance/request
      create  script/process/reaper
      create  script/process/spawner
      create  script/process/inspector
      create  script/runner
      create  script/server
      create  script/plugin
      create  public/dispatch.rb
      create  public/dispatch.cgi
      create  public/dispatch.fcgi
      create  public/404.html
      create  public/422.html
      create  public/500.html
      create  public/index.html
      create  public/favicon.ico
      create  public/robots.txt
      create  public/images/rails.png
      create  public/javascripts/prototype.js
      create  public/javascripts/effects.js
      create  public/javascripts/dragdrop.js
      create  public/javascripts/controls.js
      create  public/javascripts/application.js
      create  doc/README_FOR_APP
      create  log/server.log
      create  log/production.log
      create  log/development.log
      create  log/test.log



Penetrate your firewall (port 3000 default)

in case of CentOS, iptables setup changed



# iptables -A INPUT -p tcp --dport 3000 -j ACCEPT


(if required) reverse proxy server setup

If you feel port 3000 specific-URL is ugly, you can setup httpd (apache) to reverse-proxy as below.



    ProxyPass /ffp/     http://yourserver:3000/

    ProxyPassReverse /ffp/      http://yourserver:3000/





kick start server(Webrick)



$ cd ffp

$ script/server

=> Booting WEBrick...

=> Rails 2.2.2 application started on http://0.0.0.0:3000

=> Ctrl-C to shutdown server; call with --help for options

[2009-02-16 09:30:40] INFO  WEBrick 1.3.1

[2009-02-16 09:30:40] INFO  ruby 1.8.5 (2006-08-25) [i386-linux]

[2009-02-16 09:30:40] INFO  WEBrick::HTTPServer#start: pid=16622 port=3000

192.168.0.51 - - [16/Feb/2009:09:30:48 JST] "GET / HTTP/1.0" 200 7385

- -> /

192.168.0.51 - - [16/Feb/2009:09:33:45 JST] "GET / HTTP/1.1" 200 7385

- -> /

192.168.0.51 - - [16/Feb/2009:09:33:45 JST] "GET /javascripts/prototype.js HTTP/1.1" 200 129738

http://b137053.ppp.asahi-net.or.jp/ffp/ -> /javascripts/prototype.js

192.168.0.51 - - [16/Feb/2009:09:33:46 JST] "GET /javascripts/effects.js HTTP/1.1" 200 38675

http://b137053.ppp.asahi-net.or.jp/ffp/ -> /javascripts/effects.js

192.168.0.51 - - [16/Feb/2009:09:33:46 JST] "GET /images/rails.png HTTP/1.1" 200 6646

http://b137053.ppp.asahi-net.or.jp/ffp/ -> /images/rails.png

192.168.0.51 - - [16/Feb/2009:09:34:02 JST] "GET /rails/info/properties HTTP/1.1" 500 13415

http://b137053.ppp.asahi-net.or.jp/ffp/ -> /rails/info/properties




Welcome ABORT!!!

you'll see the message below.



MissingSourceFile

in Rails/infoController#properties



here is the framework trace.

/usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require'

/usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require'

/usr/lib/ruby/gems/1.8/gems/activesupport-2.2.2/lib/active_support/dependencies.rb:153:in `require'

/usr/lib/ruby/gems/1.8/gems/activesupport-2.2.2/lib/active_support/dependencies.rb:521:in `new_constants_in'

(snip!)
/usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require'

/usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require'

script/server:3





Thus,





no such file to load -- sqlite3

This error occurred while loading the following files:

sqlite3



but, we need SWIG before sqlite3.

The reason why SWIG is required, see http://www.machu.jp/diary/20070117.html



install SWIG



download from http://www.swig.org/download.html



# tar zxf swig-1.3.38.tar.gz

# cd swig-1.3.38

# ./configure && make && make install



install sqlite

# gem install sqlite3-ruby

Building native extensions.  This could take a while...

Successfully installed sqlite3-ruby-1.2.4

1 gem installed

Installing ri documentation for sqlite3-ruby-1.2.4...

Installing RDoc documentation for sqlite3-ruby-1.2.4...



Now your Environment seems OK

About your application’s environment


Ruby version1.8.5 (i386-linux)
RubyGems version1.3.1
Rails version2.2.2
Active Record version2.2.2
Action Pack version2.2.2
Active Resource version2.2.2
Action Mailer version2.2.2
Active Support version2.2.2
Application root/home/yourname/ffp
Environmentdevelopment
Database adaptersqlite3
Database schema version0


Setup f2p



download github code


$ cd working_directory

$ git clone git://github.com/nahi/f2p.git



initial boot to confirm setup



$ script/server

=> Booting WEBrick...

Missing these required gems:

  httpclient

  json



You're running:

  ruby 1.8.5 at /usr/bin/ruby

  rubygems 1.3.1 at /home/ocao/.gem/ruby/1.8, /usr/lib/ruby/gems/1.8



Run `rake gems:install` to install the missing gems.



install missing gems

oh, boy! where is httpclient?



$ rake gems:install

(in /home/username/f2p)

** Invoke gems:install (first_time)

** Invoke gems:base (first_time)

** Execute gems:base

** Invoke environment (first_time)

** Execute environment

rake aborted!

no such file to load -- httpclient





after all, I need to install following 2 packages (httpclient, json)

get httpclient



at http://dev.ctor.org/http-access2



# wget http://dev.ctor.org/download/httpclient-2.1.4.tar.gz

# tar zxf httpclient-2.1.4.tar.gz

# cd httpclient-2.1.4

# ruby install.rb



but, still rake is making noise on "json"

rake gems:install

(in /home/username/f2p)

rake aborted!

no such file to load -- json



(See full trace by running task with --trace)





get json

its simple.



# gem install json



Kick off again

it seems ok,



$ script/server

=> Booting WEBrick...

=> Rails 2.2.2 application started on http://0.0.0.0:3000

=> Ctrl-C to shutdown server; call with --help for options

[2009-02-16 10:53:05] INFO  WEBrick 1.3.1

[2009-02-16 10:53:05] INFO  ruby 1.8.5 (2006-08-25) [i386-linux]

[2009-02-16 10:53:05] INFO  WEBrick::HTTPServer#start: pid=25199 port=3000

192.168.0.51 - - [16/Feb/2009:10:53:15 JST] "GET / HTTP/1.1" 302 110

- -> /

192.168.0.51 - - [16/Feb/2009:10:53:16 JST] "GET /login HTTP/1.1" 200 863

- -> /login

192.168.0.51 - - [16/Feb/2009:10:54:02 JST] "POST /login/authenticate HTTP/1.1" 500 13753

http://b137053.ppp.asahi-net.or.jp/f2p/login -> /login/authenticate



but I came across login-failure




messages are below.



ActiveRecord::StatementInvalid

in LoginController#authenticate


Could not find table 'users'


/usr/lib/ruby/gems/1.8/gems/activerecord-2.2.2/lib/active_record/connection_adapters/sqlite3_adapter.rb:29:in `table_structure'

/usr/lib/ruby/gems/1.8/gems/activesupport-2.2.2/lib/active_support/core_ext/object/misc.rb:39:in `returning'

/usr/lib/ruby/gems/1.8/gems/activerecord-2.2.2/lib/active_record/connection_adapters/sqlite3_adapter.rb:28:in `table_structure'

/usr/lib/ruby/gems/1.8/gems/activerecord-2.2.2/lib/active_record/connection_adapters/sqlite_adapter.rb:189:in `columns'

/usr/lib/ruby/gems/1.8/gems/activerecord-2.2.2/lib/active_record/base.rb:1220:in `columns'

/usr/lib/ruby/gems/1.8/gems/activerecord-2.2.2/lib/active_record/base.rb:2839:in `attributes_from_column_definition_without_lock'

/usr/lib/ruby/gems/1.8/gems/activerecord-2.2.2/lib/active_record/locking/optimistic.rb:55:in `attributes_from_column_definition'

/usr/lib/ruby/gems/1.8/gems/activerecord-2.2.2/lib/active_record/base.rb:2279:in `initialize'

app/models/user.rb:62:in `initialize'

app/controllers/login_controller.rb:26:in `new'

app/controllers/login_controller.rb:26:in `authenticate'




/usr/lib/ruby/gems/1.8/gems/activerecord-2.2.2/lib/active_record/connection_adapters/sqlite3_adapter.rb:29:in `table_structure'
/usr/lib/ruby/gems/1.8/gems/activesupport-2.2.2/lib/active_support/core_ext/object/misc.rb:39:in `returning'
(snip!)

/usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require'
/usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require'
script/server:3

Request


Parameters:

{"name"=>"user",

"commit"=>"login",

"authenticity_token"=>"xxxxxx",

"remote_key"=>"xxxxxxx"}



--- 
:csrf_id: 9xxxxxxxxxxbf
flash: !map:ActionController::Flash::FlashHash {}




Response


Headers:

{"cookie"=>[],

"Content-Type"=>"",

"Cache-Control"=>"no-cache"}


migrate db for public_environment?



reference: http://d.hatena.ne.jp/ky2009/20081226/1230278255



$ rake RAILS_ENV=production db:migrate

(in /home/ocao/f2p)

==  CreateUsers: migrating ====================================================

-- create_table(:users)

   -> 0.0057s

==  CreateUsers: migrated (0.0062s) ===========================================



==  CreateProfiles: migrating =================================================

-- create_table(:profiles)

   -> 0.0056s

==  CreateProfiles: migrated (0.0060s) ========================================



==  AddUserToProfile: migrating ===============================================

-- add_column(:profiles, :user_id, :integer)

   -> 0.0104s

==  AddUserToProfile: migrated (0.0108s) ======================================



==  InsertProfileOfUser: migrating ============================================

==  InsertProfileOfUser: migrated (0.0010s) ===================================



==  AddEntriesInThreadToProfile: migrating ====================================

-- add_column(:profiles, :entries_in_thread, :integer)

   -> 0.0097s

==  AddEntriesInThreadToProfile: migrated (0.0113s) ===========================



==  CreateLastModifieds: migrating ============================================

-- create_table(:last_modifieds)

   -> 0.0058s

==  CreateLastModifieds: migrated (0.0062s) ===================================



==  CreateCheckedModifieds: migrating =========================================

-- create_table(:checked_modifieds)

   -> 0.0057s

==  CreateCheckedModifieds: migrated (0.0061s) ================================







but, still I have the same error. maybe I'm using development environment, right?



$$ ls -l db

total 36

-rw-r--r-- 1 ocao ocao     0 Feb 16 10:49 development.sqlite3 <<< nothing!!!!

drwxrwxr-x 2 ocao ocao  4096 Feb 16 06:33 migrate

-rw-r--r-- 1 ocao ocao 10240 Feb 16 11:01 production.sqlite3

-rw-rw-r-- 1 ocao ocao  1696 Feb 16 11:01 schema.rb

$ cd ..

$ rake RAILS_ENV=development db:migrate



BINGO!!! now the application starts correctly.


Sunday, February 15, 2009

WSSE on Hatena by Erlang

I'm a user of HATENA-bookmark web service. It's a major social-bookmark service in Japan.
The API of the service is in public. So, I've written some code in Erlang.

To initiate operation, WSSE certification is required.
For details of WSSE specification, please refer
 http://www.ibm.com/developerworks/webservices/library/specification/ws-secure/
For details of HATENA-bookmark API, please refer
 http://d.hatena.ne.jp/keyword/%a4%cf%a4%c6%a4%ca%a5%d5%a5%a9%a5%c8%a5%e9%a5%a4%a5%d5AtomAPI?kid=88110#wsse (in Japanese)

Following is the wsse module.


-module(wsse).
-export([new/2]).

-define(SHA1DIGESTLENGTH, 20).

new (User, Password) ->
    {A, B, C} = now(),
    random:seed(A, B, C),
    Nonce = nonce(?SHA1DIGESTLENGTH, ""),

%% create ISO 8601 compling datetime
%% $ ruby -e " require 'open-uri' ; p Time.now.iso8601"
%% => "2008-07-31T16:16:14+09:00"
%%
%%  ref) http://www.trapexit.org/Converting_Between_struct:time_and_ISO8601_Format

    {{Year, Month, Day}, {Hour, Min, Sec}} = erlang:localtime(),
    TZ = os:cmd("date +%:z"),
    Created = io_lib:format("~4.10.0B-~2.10.0B-~2.10.0BT~2.10.0B:~2.10.0B:~2.10.0B~6s",
[Year, Month, Day, Hour, Min, Sec, TZ]),
    crypto:start(),
    Digest = binary_to_list(crypto:sha(Nonce ++ Created ++ Password)),

    "UsernameToken Username=\"" ++ User ++ "\", " ++
    "PasswordDigest=\"" ++
    base64:encode_to_string(Digest) ++ "\", " ++
    "Nonce=\"" ++
    base64:encode_to_string(Nonce) ++ "\", " ++
    "Created=\"" ++ Created ++ "\"".

nonce(0,L) -> L ++ [random:uniform(255)];
nonce(N,L) -> nonce(N -1, L ++ [random:uniform(255)]).


The HATENA-bookmark service API is using Atom format data.
Here is the module for it.
 -module(getatom).
-export([new/3]).

new(User, Password, Uri) ->
RequestHeader = [{ "X-WSSE" , wsse:new(User, Password)}],

inets:start(),
{ok, {{_Version, 200, _ReasonPhrase}, _Headers, Body}} =
http:request(get, {Uri, RequestHeader}, [ ], [ ]),
Body.


Here is a typical useage from erlang interpreter command line.

> Latest = getatom:new("username", "password", "http://b.hatena.ne.jp/atom/feed").
[60,63,120,109,108,32,118,101,114,115,105,111,110,61,34,49,
46,48,34,32,101,110,99,111,100,105,110,103,61|...]
> io:format("~s",[Latest]).

<feed version="0.3"</pp>
xmlns="http://purl.org/atom/ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/"
xml:lang="ja">
<title>kgbu\343\201\256\343\203\226\343\203\203\343\202\257\343\203\236\343\203\274\343\202\257</title>
<link rel="alternate" type="text/html" href="http://b.hatena.ne.jp/kgbu/" />
...to be continued

Thursday, February 12, 2009

Web service checker by Erlang

web assertion checker by Erlang

major functionalities

Here is my initial idea.


Multi-process test-driver : Erlang's multi-processing feature fits
        Root process checks test-completion and report summary
        page-transition => inter-process communication via messages (process transition table is given by root process)

Minimum wait for Rate limit : to save tested site, 5second minimum wait is inserted.

State transition via : request path / cookie / Authentication (Basic|Digest)
       for Identity / Secret(password, authorized keys) / Session
    passed by Messages

Assertion : any AND, OR combination of followings
    equal((text|HTML docs|XHTML docs|XML docs|), state)
    DOM-equal((XHTML docs|XML docs|), state)
    Partial-DOM-equal((XHTML docs|XML docs|), DOM-mask, state)
    Response-Header-equal(Header, state)
    Partial-Response-Header-equal(Header, Header-mask, state)
    Location-equal(URL, state)
                
               

Thursday, February 5, 2009

use vmstat command to know about disk access

I love to use vmstat command to measure load of servers. But, to check disk-intensive overload, usual vmstat output is not device-specific.

$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 25924 11324 3524 135644 0 0 3 20 1 4 0 0 100 0 0
0 1 25924 11324 3528 135640 0 0 0 6 19 33 0 0 100 0 0
0 0 25924 11324 3532 135644 0 0 0 1 18 34 0 0 100 0 0


So, I looked into /proc/diskstats , but it is cryptic.

$ cat /proc/diskstats

1 0 ram0 0 0 0 0 0 0 0 0 0 0 0
1 1 ram1 0 0 0 0 0 0 0 0 0 0 0
1 2 ram2 0 0 0 0 0 0 0 0 0 0 0
1 3 ram3 0 0 0 0 0 0 0 0 0 0 0
1 4 ram4 0 0 0 0 0 0 0 0 0 0 0
1 5 ram5 0 0 0 0 0 0 0 0 0 0 0
1 6 ram6 0 0 0 0 0 0 0 0 0 0 0
1 7 ram7 0 0 0 0 0 0 0 0 0 0 0
1 8 ram8 0 0 0 0 0 0 0 0 0 0 0
1 9 ram9 0 0 0 0 0 0 0 0 0 0 0
1 10 ram10 0 0 0 0 0 0 0 0 0 0 0
1 11 ram11 0 0 0 0 0 0 0 0 0 0 0
1 12 ram12 0 0 0 0 0 0 0 0 0 0 0
1 13 ram13 0 0 0 0 0 0 0 0 0 0 0
1 14 ram14 0 0 0 0 0 0 0 0 0 0 0
1 15 ram15 0 0 0 0 0 0 0 0 0 0 0
202 0 xvda 1900702 111759 57624068 22798740 21260401 21742422 344022684 149429332 0 25033876 172228060
202 1 xvda1 1178 2366 2 4
202 2 xvda2 2011295 57621462 43002835 344022680
253 0 dm-0 2002226 0 57550986 23818188 42992837 0 343942696 261736148 0 25033368 285554312
253 1 dm-1 8752 0 70016 102772 9998 0 79984 347408 0 16320 450180
9 0 md0 0 0 0 0 0 0 0 0 0 0 0

Then, I come back to vmstat with "-d" option. Just to see formatted information of /proc/diskstats.

$ vmstat -d
disk- ------------reads------------ ------------writes----------- -----IO------
total merged sectors ms total merged sectors ms cur sec
ram0 0 0 0 0 0 0 0 0 0 0
ram1 0 0 0 0 0 0 0 0 0 0
ram2 0 0 0 0 0 0 0 0 0 0
ram3 0 0 0 0 0 0 0 0 0 0
ram4 0 0 0 0 0 0 0 0 0 0
ram5 0 0 0 0 0 0 0 0 0 0
ram6 0 0 0 0 0 0 0 0 0 0
ram7 0 0 0 0 0 0 0 0 0 0
ram8 0 0 0 0 0 0 0 0 0 0
ram9 0 0 0 0 0 0 0 0 0 0
ram10 0 0 0 0 0 0 0 0 0 0
ram11 0 0 0 0 0 0 0 0 0 0
ram12 0 0 0 0 0 0 0 0 0 0
ram13 0 0 0 0 0 0 0 0 0 0
ram14 0 0 0 0 0 0 0 0 0 0
ram15 0 0 0 0 0 0 0 0 0 0
xvda 1900690 111759 57623940 22798620 21260398 21742409 344022556 149429324 0 25033
dm-0 2002214 0 57550858 23818068 42992821 0 343942568 261736088 0 25033
dm-1 8752 0 70016 102772 9998 0 79984 347408 0 16
md0 0 0 0 0 0 0 0 0 0 0

Saturday, January 24, 2009

Why linux filesystems (ext2, ext3) get slow when >1K files in a directory

I know that, but don't know in detail.
(this article is English version of my Japanese blog)

Especially, ls command is very slow, so I thought readdir system call is too slow. But, I was WRONG.
Overhead is in ls command's algorithm.
Mailing List article in linux-users ( http://his.luky.org/ML/linux-users.6/msg08919.html ) taught me that.
ls command collects information not only the list of filenames but also each file's attributes (size, permission.. etc.).
So, ls command checks attributes for each files. That takes long time.

In my application, only the list of filenames is required. So, readdir system call just works fine.
Here is the sample code (almost the same as manpage of readdir!)



#include <dirent.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char *argv[])
{
DIR *dirp;
struct dirent *dp;

if (argc != 2 ) {
printf("coundn't open dir\n");
return;
}


if ( (dirp = opendir(argv[1]) ) == NULL) {
printf("coundn't open dir\n");
return;
}

do {
errno = 0;
if ( (dp = readdir(dirp) ) != NULL) {
(void) printf("%s\n", dp->d_name);
}
} while (dp != NULL);

if (errno != 0)
perror("error reading directory ");

(void) closedir(dirp);
return(0);
}

Now, I checked the performance of readdir itself.
In the case of over 10K files in a directory, it takes 300msec (Kernel 2.6.9-42.ELsmp: Cent OS4.4 4800 bogomips)
Here is the result of 'strace -c'


# strace -c readdir . > /dev/null
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
80.44 0.442220 14 30821 getdents64
19.48 0.107076 8 13018 write
0.03 0.000149 149 1 execve
0.01 0.000054 11 5 old_mmap
0.01 0.000038 13 3 open
0.01 0.000035 35 1 read
0.01 0.000031 8 4 fstat64
0.01 0.000030 8 4 brk
0.00 0.000022 11 2 mprotect
0.00 0.000014 5 3 close
0.00 0.000013 13 1 munmap
0.00 0.000011 11 1 1 access
0.00 0.000008 8 1 mmap2
0.00 0.000008 8 1 fcntl64
0.00 0.000007 7 1 1 ioctl
0.00 0.000007 7 1 uname
0.00 0.000003 3 1 set_thread_area
------ ----------- ----------- --------- --------- ----------------
100.00 0.549726 43869 2 total

The most time-consuming system-call is getdents64. System(Disk) cache speed up the systemcall.
If cache is full-hit, getdents64 takes only 10usecs for 1M files in a directory.
If you tried this on NFS-mounted directory, cache-effect maybe small.

Wednesday, January 21, 2009

Fixing Makefile to install Erlang R12B-5(R12B-4) on CentOS5.2

I've noticed incompleteness in the Makefile of Erlang source-tarball package in R12B-4 version.
To build Erlang system from the source files, under CentOS 5 environment, additional library option is required.

for details, you can see Peter Lemenkov's post to erlang-questions ML.
http://www.erlang.org/pipermail/erlang-questions/2008-August/037237.html

or, if you are can read Japanese, refer my blog in Japanese.
http://d.hatena.ne.jp/kgbu/20080909/1220984877

Here is the patch for the "lib/ssl/c_src/Makefile.in".
http://cvs.fedoraproject.org/viewvc/rpms/erlang/EL-5/otp-ssl_missing_libs.patch?view=auto&revision=1.1

Still the bug exists in R12B-5 version of Erlang (latest as of today).

Saturday, January 17, 2009

Checkpoints to Install rdiff-backup

Checkpoints to install rdiff-backup (in Linux)

in case you are stuck in ...

1) check python-devel package installation
2) check librsync library dependency
3) check librsync library compile option. You may need to re-comile it with -fPIC option

see. http://wiki.rdiff-backup.org/wiki/index.php/RdiffBackupWiki

below, summay of operations (in my environment: Fedora Core 5, x86_64)
(this post is English version of my original blog in Japanese )
# rpm -aq | grep python-devel
# yum install python-devel

# rpm -aq | grep librsync
# wget http://downloads.sourceforge.net/librsync/librsync-0.9.7.tar.gz?modtime=1097439809&big_mirror=0
# tar zxf librsync-0.9.7.tar.gz
# cd librsync-0.9.7
# ./configure
# make AM_CFLAGS=-fPIC
# make install
# ldconfig
# cd ..

# wget http://savannah.nongnu.org/download/rdiff-backup/rdiff-backup-1.2.5.tar.gz
# tar zxf rdiff-backup-1.2.5.tar.gz
# cd rdiff-backup-1.2.5
# python setup.py install

Wednesday, January 7, 2009

first post

This is my first post.


Here, I'll make some memoises/frakes of monky-typing codes.
So, please look over them if you have much spare time.