How I Deal With Scrapers

Scrapers are a part of the internet, so rather than get angry, I get even. A scraper site is a web site that copies content of other sites and republishes them as their own content. I actually don’t have a problem as long as the scraper site leaves links intact, so it links back to my site.

The other day I was checking a blog of mine on http://copyscape.com. Copyscape can check if your blog is being scraped amongst other things. It has a limited free service, otherwise the full service is 5c a search.

My blog was being scraped, by about 50 times in fact. But one site was removing links, and they actually out ranked me for one of my original articles. That is when I searched for the title of my post, they were listed number one in google, and my site was hidden from the search result behind the “repeat the search with the omitted results included” link.

That’s just not cricket. I went to the article that was outranking mine, and they were running adsense ads on it, I clicked the “ads by google” link and made a complaint about copyright infringing content on their site. I got an email from google telling me how to send them a DCMA take down request, which I didn’t bother doing. It requires faxing, too hard.

I then went back to the search result page from google and clicked the “Dissatisfied? Help us improve” link and complained that google were ranking a scraper site ahead of mine.

Then I went to delicious and made a few bookmarks back to my article. A few days later, my article is number one in google when searching for the title, and the scraper site is buried deep down in the results. Job done, feeling satisfied.

I would hope google will eventually check the scraper site out and close their adsense account, but if the scraper site is making money for google, I don’t see the incentive.

In My Spare Time…

…When I’m not working on linux, I’ve been working on some websites. Mainly blogs, but also some ebay affiliate sites that are usually attached to blogs. I’m trying to get 30 up to start with, but I’m not very commited, I’ve got five going, oldest one is five months old. Once I hit 30, I’ll aim for 100 ;);

I got interested in SEO for some reason, and have been honing my skills with these sites with the intent to make a little money. If I can make $1/day per site, that’s $10,000 a year. Very acheivable.

The basic idea is to pick a phrase, maybe a three word phrase, that has 100,000 hits or less in google and 100 searches for it a day (there’s various way to determine search volume, such as google keyword tool, google trends, and a few free SEO type sites). That will give you a good chance of getting on the front page of google for that phrase in the first week and getting some traffic. If you don’t hit the front page of google straight away, you might have to wait six months or so. You need to be patient in this game.

So far, with one blog that was very targeted I got the seventh result in google almost instantly. With another site that wasn’t so targeted, I’m nowhere to be seen. So with that last site, I’ll build up ten articles, and keep an eye on it for six months. With the first site, I’ll try to post once a week and build a few links by submitting articles to howto sites, and directories. In five months, it’s paid for itself already with only a few hours work put in to it, it has seven articles, traffic is increasing every month, I’m getting natural backlinks and subscriptions to the feed, as well as a few artifitial backlinks I’ve made. I’ve started targeting keywords that have more search volume.

To target a phrase, put it in the title, h1 tag, and anchor text in each post pointing back to the main page. Each page should have a unique title, though. Also put it in the anchor text of external links.

What I really enjoy is on page SEO and ad placement and look, what I don’t enjoy is creating artificatial links with the intent of getting an artifically high google rank. Even though google says SEO isn’t spam, I can’t help but think it’s spammy.

But I’m enjoying creating content, and studying SEO and marketing. I made my first ebay affiliate sale sale after 50 hits, very exciting. I might have a crack at amazon afiliate links next and incorporating ebay into wordpress, rather than a separate page.

BTW, targeting internet savvy people isn’t the easiest way to make money, they are blind to ads. Best to go after non techy people. I’m running ads on this site, but it’s dragging my CTR down. I’ll give it a few more months and then turn them off.

Just got to get these 30 sites up…

Trigonometry Blues

Trigonometry Solar Funnel

I’ve been stuck on this problem for a week. It’s for a solar funnel I’m working on (no, it’s not homework) I’m trying to workout an equation that solves theta given x and y.

The idea is to have the funnel as wide as possible (smallest theta) so sun hitting the very top of the funnel just enters the oven. Any help?

I think if x = y, theta should be 90 – 45 / 2

Update

The equation is:
equation


http://www.youtube.com/watch?v=J06nOZES084

Here’s daddy stovetop playing the blues. Nice and slow, where the silence in between the notes is just as important as the notes themselves. Spain were masters of that. Spain are one of my favourite bands.


http://youtube.com/watch?v=rn8MWBVAMhE

It’s not party music, but if you have some sadness deep down that doesn’t want to come out, Spain will drag all the emotions out, you have a good cry, maybe, and feel much better after. Or if you just need to wind down…

The Power Of The Beard

I started out listening to this tune (called Drifting), well, watching, too. Amazing technique, it struck me how relaxed he is in his shoulders while his hands are banging around. And the Beard!

http://youtube.com/watch?v=Ddn4MGaS3N4
I thought it was my favourite. Then I found this one, same beard.

http://youtube.com/watch?v=dt1fB62cGbo
At first the melody was vaguely familiar, but I couldn’t put words to it. It was rather annoying. I found the original, which I like, but interesting covers are always welcome to me, and this cover rocks. It’s my favourite now. The artist is Andy Mckee, he’s a true hacker.

(It’s amazing what you can achieve with a beard.)

Eee PC Ubuntu WiFi On/Off

One of the minor issues I had since installing ubuntu on my eee pc was I couldn’t work out how to turn the wifi off. There’s a Fn+F2 key combo that works in xandros, but it didn’t work for me in ubuntu. I tried googling for the answer and got nowhere.

So, for google, to turn wifi off, I had to echo 0 > /proc/acpi/asus/wlan Turning it on is obviously echo 1 > /proc/acpi/asus/wlan

Another problem fixed (sort of), I wouldn’t mind fixing Fn+F2, but I’m happy enough with the echo for now.

Eee PC Greasemonkey Script

Inspired by Paul Fenwick’s entertaining talk on cleaning up the web at LCA (in other words, I ripped it off) I set out to make google reader more useable on the Eee.

But first, an invaluable Firefox plugin to have for the Eee is Nuke Anything Enhanced. It lets you right click on an object on a web page, and “nuke” it, make it disappear. The screen then rearranges itself using the space you just created. It gives you more reading room, and less clutter. It’s not gone for good, though, if you reload the page it will be back. It’s fine for one off things, but if it’s a page you’ll be opening frequently, like google reader in my case, you’ll want a permanent solution. Enter greasemonkey.

// ==UserScript==
// @name google reader for eee pc
// @namespace http://chesterton.id.au/
// @description Removes unnecessary elements
// @include http://*.google.com/reader/*
// ==/UserScript==

hideContent('selectors-box');
hideContent('gbar');
hideContent('logo');
hideContent('global-info');

function hideContent(id) {
   var node = document.getElementById(id);
   if (node) {
     node.style.display = "none";
   }
}

So I ripped hideContent off from Paul, I found with google reader removing the object broke the page, but hiding it worked, so I modified his function a little. I think his was called removeContent, or similar, and deleted rather than setting the display style to none.

Save the code as (eg) googlereader.user.js and open it in firefox. The .user.js part of the file name is important, the first part isn’t. Obviously you need to install greasemonkey, too.

Google Reader Greasemonkey Script

My Telephone Setup

I thought I’d document my current phone setup, probably look back in 10 years time and think how primitive it was. The brains of it are handled by freeswitch, I luurv freeswitch ;), running on a little VM with 128M of memory, and a few gigs of disk.

My internet provider is iinet, I get 10Mbps down and 1Mbps up. My VOIP provider is pennytel, chosen for their price. $5/month, includes $4 of free calls, it’s 8c a call untimed for Australia and about 20 other countries. Mobiles are 10.5c/minute billed by the second. They have a fax to email service for another $5/month, but no email to fax. So I can’t throw away the fax machine just yet. There’s a few other handy services they offer that I use, they come with the plan.

I was willing to pay extra and go with iinet’s VOIP for the reason of being closer network wise, and unshaped if I exceed my monthly download limit. But iinet only lets you use their VOIP from the iinet network, not sure why that would be, but I ruled them out based on that. Pennytel has the disadvantage of being further away network wise, and affected by shaping, so my phone doesn’t work from home if I get shaped. But it works wherever I am on the planet, assuming I can get a fairly consistent trip time. (I could be in America, and take and make calls from my Australian VOIP number)

My VOIP phone is a nokia E65 with WiFi and SIP. Dial a number, and it goes via SIP if it’s available, otherwise it uses the mobile network. Wifi drains the battery, but I’ve gotten used to charging it every other day, or whatever it is.

The only thing I haven’t tested much is QoS, I’ve recently adjusted it on my ADSL router, but there’s no real way of seeing if it works, other than experimenting. I’m considering putting the router in bridge mode, and using tc on linux. That’s going to require another network card, though.

I still have a landline, my fax machine is connected to it.

Phil Colbourn Writes About The Youtube Blackout

On 02/03/2008, at 10:30 AM, phil colbourn wrote:

Last week Pakistan Telecom was ordered by their government to block access to youtube. They did this by re-directing routes containing the youtube addresses to an internal dead-end. A mistake was made that advertised this to their peer and thus the remainder of the internet.

The IP range that was affected was 208.65.152.0/22. Pakistan added more specific routes for 208.65.153.0/24 which are a longer match and so take priority.

To see what happened go to this site and start the java applet.

http://bgplay.routeviews.org/bgplay/

To see how it should look enter this address: 208.65.152.0/22, yesterday’s date and todays date – don’t worry about the time fields.

The AS (autonomous system – roughly meaning a country or large ISP or large company) 36561 is where youtube packets should generally be sent. You can see all the lines from other domains leading to this AS for this address range.

Press the play button (a small triangle) to start the animation. You should see the links occasionally changing as changes are made or transmission links break or are fixed.

Now press the New Query button.

Then enter the address that poisoned BGP: 208.65.153.0/24, 23/2/2008 and 26/2/2008 – a period around the event.

Initially the page has no lines. This is because prior to the event, this route was not used. Just imaging all the links still go to AS 36561.

Press the Play button.

Over time you see the rouge Pakistan Telecom domain 17557 start to become the priority route for all youtube traffic until it seems to have all the routes. When the fault was fixed you can see the links moving back to AS 36561 where they should be.

Imagine how easy it is now to interrupt any domain or the whole internet? I think this risk will be fixed shortly.


Phil philcolbourn ;at; gmail.com

” Someone else has solved it and posted it on the internet for free ”

I Have A New Office

Just got a new office. An Eee PC (JB HiFi had them for $450), and a 3G wireless internet connection (5Gigs/month for $39/month). Not really sure how much time I’m going to get out of it with the usb wireless modem connected, I’ve been told the modem sucks the juice. Got a bit of experimenting to do with the battery, and to see if VoIP works.

Still running the default desktop, it looks fine to me, but I’ll be testing out other desktops like eeexubuntu. I was really surprised how quickly open office launched, the speed is impressive.

I’m looking forward to mornings with a coffee down the river.

Remotely Upgrading RHEL to Ubuntu

We were given a server to play with to do whatever we wanted, hosted in a US data centre. It was running RHEL 4.x, I could have worked with that, but it didn’t look like it was on a support contract, I wasn’t able to update it, anyway.

I thought about trying to convert it to centos, but realised it would be more fun to upgrade it to ubuntu gutsy.

Quick steps from memory

  1. swapoff -a
  2. mkfs.ext3 /dev/VolGroup00/LogVol01 (old swap)
  3. mount /dev/VolGroup00/LogVol01 /mnt
  4. wget debootstrap.deb (from gutsy)
  5. ar x debootstrap.deb
  6. tar -C / -xzf data.tar.gz
  7. debootstrap –arch=i386 gutsy /mnt
  8. chroot /mnt
  9. mount proc, edit /etc/fstab and /etc/network/interfaces
  10. apt-get install ubuntu-minimal ubuntu-standard linux-image postfix openssh-server plus a few other packages, some that were recommends.
  11. copied gutsy /boot/* to the real /boot and created a new entry in menu.lst
  12. created an account and uploaded some ssh keys
  13. probably some steps I’ve forgotten
  14. reboot

Easy peasy, just waiting for it to come up, it’s been 5 hours now, still waiting. Must be the slowest booting server ever.

Guess I’ll be calling the states now. :’(