SPAMBots … come and get me …

I am trying a SPAM blocker solution for the first time on this site and I am officially on test/validation mode so …. Dear SPAMBots and even the human type of spammers are very welcome to try and flood my site.

NO, I am not crazy.

I do really want to check if this solution works. To demonstrate that I am committed, I have also added a “Contact” form at the “About” page with no CAPTCHA or silly questions asked. I am ready to dive into messages if necessary. Give me your best shoot.

SPAM is annoying, I got it.

Wow, I do still not know how could I had been so patient the past 6 weeks where I in purpose stopped writing on this BLOG for personal reasons and also due I was trying to understand what is all about with this SPAM flooding affecting the miserable four blog entries I have written so far. I stop counting the type of SPAM as well as how many different languages those comments were written mainly some type of Cyrillic kind of language I assume mostly Russian or from some eastern Europe countries. Topics ranged from medicine sales, human “organs” enlargement stuff, any sort of services sales and few trying to advertise their very own blogs were I do assume all these trying to drive traffic to other websites.

I FINALLY DECIDED TO STOP ACCEPTING COMMENTS ON MY BLOG POSTS !!!! Well at least I had a couple of legitimate commentators; thanks you two and sorry I am not showing your comments anymore.I am in the process to figure out a best way to handle this SPAM but that might require to install the newest version of WordPress which will take some time due it is very low in my To Do list of pending tasks.

For the few of you that may be asking themselves what in the world this guy is writing about. SPAM (Electronic NOT Food) a form of abuse of an electronic service where unsolicited or unexpected (read, not related to the topic in question) messages or blog comments in my case are received in a bulk or in a indiscriminately form. Even though WordPress application does a great job to help you manage posts and pages comments the amount of them I have been receiving daily in the past few weeks it is ridiculous.

Let’s see if I can provide some help out of this post other than ranting about this issue. If you are suffering of the same problem try reading the official WordPress.org article on this topic; “Combating Comment SPAM“.  There is also a list of “Anti SPAM plug ins” for WordPress but I can not recommend anyone just yet. Seems like CAPTCHA is a very popular way to avoid non-human SPAMers which claims to stop a big chunk of this SPAM traffic but still you would need to deal with the human type which might be somewhat manageable. Most WordPress “Anti-SPAM” plug-ins uses some sort of heuristics to determine if the comment in question is suspicious to be SPAM or not. Other’s use an external database where most of the well known SPAMers domain names and ip addresses are captured and keep in track.

A list of recent articles you can find out of ACM digital library in case you would like to get a some what deep  review on this topic.

TrackBack spam: abuse and prevention

Elie Bursztein, Peifung E. Lam, John C. Mitchell
November 2009
CCSW ’09: Proceedings of the 2009 ACM workshop on Cloud computing securityContemporary blogs receive comments and TrackBacks, which result in cross-references between blogs. We conducted a longitudinal study of TrackBack spam, collecting and analyzing almost 10 million samples from a massive spam campaign over a one-year period. …

Keywords: blog, linkback, linksback, pingback, refback, secure trackback, spam, talkback, trackback

Spam filtering for short messages

Gordon V. Cormack, José María Gómez Hidalgo, Enrique Puertas Sánz
November 2007
CIKM ’07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge managementWe consider the problem of content-based spam filtering for short text messages that arise in three contexts: mobile (SMS) communication, blog comments, and email summary information such as might be displayed by a low-bandwidth client. Short messages …

Keywords: blog, classification, email, filtering, sms, spam

Detecting splogs via temporal dynamics using self-similarity analysis

Yu-Ru Lin, Hari Sundaram, Yun Chi, Junichi Tatemura, Belle L. Tseng
February 2008
Transactions on the Web (TWEB) , Volume 2 Issue 1This article addresses the problem of spam blog (splog) detection using temporal and structural regularity of content, post time and links. Splogs are undesirable blogs meant to attract search engine traffic, used solely for promoting affiliate sites. …

Keywords: Blogs, regularity, self-similarity, spam, splog detection, temporal dynamics, topology

Identifying commented passages of documents using implicit hyperlinks

Jean-Yves Delort
August 2006
HYPERTEXT ’06: Proceedings of the seventeenth conference on Hypertext and hypermediaThis paper addresses the issue of automatically selecting passages of blog posts using readers’ comments. The problem is difficult because: (i) the textual content of blogs is often noisy, (ii) comments do not always target passages of the posts and, …

Keywords: implicit links, passage extraction, weblogs

Relaxed online SVMs for spam filtering

D. Sculley, Gabriel M. Wachman
July 2007
SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalSpam is a key problem in electronic communication, including large-scale email systems and the growing number of blogs. Content-based filtering is one reliable method of combating this threat in its various forms, but some academic researchers and industrial …

Keywords: blogs, spam filtering, splogs, support vector machines

What about your Blog’s motion …

After some time of “blogging” constipation I finally had a chance to sit down and think about something rather basic as well as important. When people in general do things, most of the time those activities are looking to achieve something. I guess this is the most primitive idea behind Newton’s law of motion which I am not going to discuss on this writing but just assuming it could be applicable for this matter.

Three laws of “blogging” mapped to Newton’s motion laws.

1 – In the absence of “Blogging”, the “Purpose” either is at rest or moves in a straight line with constant speed, so basically if nothing is written there is no expected change towards the purpose.

2 – A “Blog” site experiencing a force F experiences an acceleration a related to F by F = ma, where m is the mass of the “Blog” site. Alternatively, force is equal to the time derivative of momentum.

Ok this one will be more complex to find out but here it is my shot; If your site is injected with new “Blog” entries you are actually increasing its mass and every time the content of your “Blogging” becomes relevant to someone else and it gets read and perhaps commented, your “Blog” site gets a “hit” and a  force is induced and perhaps would experience some acceleration towards your “Purpose”. (I do not have scientific evidence of this but it could be fun to find out)

3 -  Whenever a first “Blog” exerts a force F on a second “Blog”, the second “Blog” exerts a force −F on the first body. F and −F are equal in magnitude and opposite in direction. or in plain English ”To every action there is always an equal and opposite reaction: or the forces of two bodies on each other are always equal and are directed in opposite directions”.

Well, this one is tricky but hope I got it somewhat nailed.  Adding links to your “Blog” entries is a good practice, after all we do not invented, discovered or first wrote about something so giving credit to original authors just like in science articles would usually pays off. So how is this affecting your way to reach a “Purpose”. Every time a “Blog” entry points  (links to) another “Blog”  there is always an equal and opposite reaction. In this case  your writing is looking for validation or support from the pointed “Blog” about what is has been written or expressed; using legitimate and proper URL recommendations via links will reward you with credibility that your readers most of the time appreciate which could be traduced into more solid or authoritative hits (reads). Then remember the more hits you get the more acceleration and therefor taking you closer to your purpose.

Well, I sadly found out that linking to other “Blogs” or websites is not always positive but this is a topic for another “Blog” in itself; Be aware that comments over your “Blogging” usually accepts URLs pointing somewhere else and most of the time SPAM comments are not a good acceleration mechanism and does not produce the opposite force you are looking for.

NOW SOME SERIOUS REFERENCES ABOUT THIS TOPIC

[1] “The Laws of the web“, Bernardo Hubarman, MIT Press, 2001.

[2] “The Web’s Hidden order“, Lada A Adamic, Bernardo Huberman, Communications of the ACM, 2001

[3] Expressing Social Relationships on the Blog through Links and Comments, Lada Adamic, Adamic publications.

[4]  “2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics“, Lada Adamic, Adamic publications. , 2005.

[5]” Exploring Similarity among Web Pages Using the Hyperlink Structure“, International Conference on Information Technology: Coding and Computing (ITCC’04) Volume 1.

What about my blog categories?

I should not be to worry about this topic since this is my third blog entry so far and I am pretty sure there is not to much to categorize with this minimal “blog” production ;(  but “BLOGS CATEGORIZATION” is certainly a good topic to talk about. Well I am not referring to the fact that most blogging platforms support “Tagging and Categorizing” your blog entries such they can be organized and presented in a particular manner for future reference but to the fact that those categories would have a meaning or will perhaps be driving behavior in respect of the particular blog purpose.  oooohh, I guess I just touched a weak point about my blog site, Do I really have a purpose for my blog defined? I guess I should but that is another topic for a later time of inspiration.

Ok, so far without  doing any significant research effort about this matter I have found (empirically) three major categorization approaches.  I know there may exists many more but I will restrict today’s writing to those I believe I would be looking at first.

Write first categorize later.

This perhaps is the most common approach to categorizing since it is the most convenient and your blog platform would certainly help you on the “post’s categorization” mechanics. Do not worry about what you are blogging but just produce, produce, produce … writing material until you get to compile a significant amount of titles and keywords so you can figure out what is your mind being up to all the time. I believe by then it will be fairly easy to make some sort of text mining process to find out; the common denominators or patterns. Be aware that I am not suggesting applying an elaborated text mining algorithm or something similar to find this out, perhaps a manual observation and simple spreadsheet counting analysis will give you the clue. If you are interested you can always read a bit more serious material on how applying text mining and machine learning techniques would be able to help. I found an article published by ACM particularly interesting about this topic that takes into account “unknown” words in the equation.

Blog categorization exploiting domain dictionary and dynamically estimated domains of unknown words. Human Language Technology Conference archive
Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers table of contents
Columbus, Ohio POSTER SESSION: Short paper posters table of contentsPages 69-72   Year of Publication: 2008,  Chikara Hashimoto; Yamagata University, Yonezawa-shi, Yamagata, Japan, Sadao Kurohashi; Kyoto University, Sakyo-ku, Kyoto, Japan, Association for Computational Linguistics,  Morristown, NJ, USA 

Categorize first write later.

This is an approach that maybe used by those blogging professionally since they get paid for their blogging activities it is very likely that an editorial line of writing is being either impose or strongly suggested in order to get paid. Since I am not a professional blogger (syndicated) and I am not getting any money out of writing on my personal blog then there is not editorial line of writing imposed by anyone in particular but that does not mean I can not have one imposed by myself. Having an editorial line is good because it keeps you focused on the topics you want to produce, maximizing productivity while reducing or perhaps avoiding taking time to write about things that are not in your personal script. Yes, in fact this will limit somewhat your creativity and will force you to analyze what is the purpose of your blogging site right up front rather than later which may be a bit cumbersome for many of us to start with but later on it will help you on blog writing planning and to maintain a balance of the different areas of interest while producing your articles. Be aware that you always have the “Uncategorized” category where you can throw over anything you have in mind regardless of any established category and you can always add more categories as you go. What I think it is not a good practice is to move your articles from one category to another once they got already categorized since this may confuse your readers but you can always use “Tagging” to help readers to find your articles using different keywords regardless where do they got categorized.

Do not categorize unless someone complains about it.

This is the easiest one to adopt but also the most difficult to keep up since you will be tempted to apply some categorization to your blog site once it start to exponentially grow which will take you back to point one in the list. If you decide not to categorize and be on the wild writing mostly about anything and becoming the Robin Hood of the blog readers by taking your precious time and providing goodness to the poor of us in need of sapience and expertise then you do not want to categorize all your articles to find out that you have more categories than the New York Times but you will certainly take advantage of “Tagging” your work so the piles of articles you generate can be easily found in the time to come and not only the few displayed in your main blog site page getting hit by your readers. Tagging text in itself is not easy task neither but is something you can randomly do and change at any time without serious implications to your blog’s search ability. You may be thinking of what a paramount of work would it be tagging all your articles all the time while you can be writing stuff instead then there it is some experts advise on the topic such as applying “Automated Text Tagging” by Ben Scofield or if you are more attracted to a probabilistic way to do this you can always read about what some researchers suggest to tackle this problem.

A Hybrid Probabilistic/Connectionist Approach to Text Tagging Technical Report: #930115 Year of Publication: 1993  

Julian E. Boggess

Lois C. Boggess Mississippi State University  Mississippi State, MS, USA 

Last but not least …

As always any hints you may provide to approach this “categorizing” dilema would be very helpful so please feel free to drop a comment or two. I promise to publish all comments except for the one’s looking like SPAM or from a doubtful source.

I have a blog, now what?

 I am not sure if this particular “blog” entry will be even read by someone outside my family of four including my two little ones who are not even teenagers yet but here I am, just after one day of successfully replacing my old boring and outdated “fidelvanegas.net” website with a brand new “WordPress” based “blogging” application. NOW WHAT? is the question.

Should I write a new “Blog” entry every day? or every week? or every month? … well I had not updated or enhanced my former website for months  before deciding to switch and become a “blogger” instead. I guess I will find out what is the best publishing frequency for me along the way but for now I can proudly say “I am an every other day “blogger” well at least for the first two attempts.

Should I include a picture or some sort of chart or graphic stuff on every “Blog” entry? mmmmhhh I am not precisely a graphics designer kind of person but I would certainly like to add some illustration to enhance ideas I will be exposing. For this “Blog” entry in particular I am not in the mood to add any picture! Hey who said “blogging” is not a way to exercise our freedom so today NO PICTURES!!!!

What about adding any link? basically a way to state that I recommend someone Else’s “Blog” or piece or work found in the Internet cloud. YES I will recommend WordPress.ORG folks for this wonderful “blogging” platform which I have found very easy to install, configure and use. I could easily create my own “Blog” application myself but why bother when there is a bunch of great PHP developers improving “WordPress” platform for me! It is not Open-Source movement a beautiful thing? WHAT???? DID I JUST RECOMMENDED ANOTHER WEBSITE JUST FOR FREE?. mmmmhhh Is not that what Free-Software foundation is all about? If a get some software for free would not be great to give a recommendation back for free?

OK, to many words for my second “Blog” entry attempt. wait a minute; is “Blog” a word? I am not a native English speaker so lets go ahead and rely on our trusty “Galactic Encyclopedia” like Mr. Isaac Asimov would refer to WikiPedia.ORG if he would be still alive today (This is just my opinion about Mr Asimov probable way of thinking and sadly I can not verify that personally). “A blog (a contraction of the term “weblog”)” … more  Yes it is actually a word and a very popular one at least for this “Blog” entry roughly ~15 times.

Hello Blog!!!

This is my first post and the starting point away from my old fashion front-page based website. Older website content will be preserved under the following path: http://fidelvanegas.net/original