Memecodes Webpages Experiment

Discussion in 'Health and medical' started by Philipp Lenssen, Feb 24, 2004.

  1. At http://memecodes.outer-court.com/ :

    "Here's an idea for a set of evolutionary pages that have "natural" offspring and grow into being
    more popular. I plan to put up a set of pages with random words; Project Memecodes. Say, 5,000 pages
    with random word sequences (let's pray to the Googlegod I won't get the death penalty for that).

    Now whenever a page gets a visitor who arrived from Google, the page will create a new modified
    randomized version of itself via its database back-end, and create a link to it in a visible place.
    The new page will continue do the same as the old page. After a while, a page "dies" and is taken
    offline. Soon several pages would be able to specialize on search niches in the Web environment –
    word combinations people are looking for that are not yet covered online and therefore make my
    evolutionary pages turn up in the top results which people actually click on. A search phrase
    entered by a search engine visitor is just like food in nature's ecosystem – there will be
    specialized pages to catch this food. A page's "meme code" will lead it to become a successful
    species with a lot of offspring, or die and be forgotten.

    Like infinite monkeys writing Shakespeare, if the experiment will be running long enough with enough
    modifications, the once randomly worded pages might even modify themselves to the point of becoming
    natural language."

    Hope that made sense -- feedbackw welcome :)
     
    Tags:


  2. Tim Tyler

    Tim Tyler Guest

    Philipp Lenssen <[email protected]> wrote or quoted:

    > At http://memecodes.outer-court.com/ :

    The titles of pages are given prominent weight by search engines - so you may attract more visitors
    if you make these more interesting.

    Search engine spiders are irregular beasts. You may find a generation in your system takes a long
    time - and thus that development is /very/ slow.

    If your experiment is successful, it may be the cause much frustration among "users" ;-)
    --
    __________
    |im |yler http://timtyler.org/ [email protected] Remove lock to reply.
     
  3. Huck Turner

    Huck Turner Guest

    [email protected] (Philipp Lenssen) wrote in message news:<[email protected]>...
    > At http://memecodes.outer-court.com/ :
    >
    > "Here's an idea for a set of evolutionary pages that have "natural" offspring and grow into
    > being more popular. I plan to put up a set of pages with random words; Project Memecodes. Say,
    > 5,000 pages with random word sequences (let's pray to the Googlegod I won't get the death
    > penalty for that).
    >
    > Now whenever a page gets a visitor who arrived from Google, the page will create a new modified
    > randomized version of itself via its database back-end, and create a link to it in a visible
    > place. The new page will continue do the same as the old page. After a while, a page "dies" and is
    > taken offline. Soon several pages would be able to specialize on search niches in the Web
    > environment ? word combinations people are looking for that are not yet covered online and
    > therefore make my evolutionary pages turn up in the top results which people actually click on. A
    > search phrase entered by a search engine visitor is just like food in nature's ecosystem ? there
    > will be specialized pages to catch this food. A page's "meme code" will lead it to become a
    > successful species with a lot of offspring, or die and be forgotten.
    >
    > Like infinite monkeys writing Shakespeare, if the experiment will be running long enough with
    > enough modifications, the once randomly worded pages might even modify themselves to the point of
    > becoming natural language."
    >
    > Hope that made sense -- feedbackw welcome :)

    It's a very interesting idea, but the speed of the evolution might be improved if you used search
    query data directly without waiting for people to actually find your pages (considering that search
    engines will only reindex your pages every few weeks). You might also annoy people less if you went
    about it this way. There are a number of places you can go to see data about what people type into
    search engines.

    One example: http://www.metaspy.com/info.metac.spy/metaspy/

    For a list of others sites with search query data see:
    http://www.searchenginewatch.com/facts/article.php/2156041

    Other people have tried to generate natural language texts based on word co-occurrence statistics
    using text from books, newspapers, etc. The texts that resulted contained strings of words that
    sounded grammatical within a window of two or three words, but which didn't form proper sentences.
    Co-occurrence statistics aren't enough.

    In your case though, things are even worse. Given that most search queries are lists of key words
    (mostly nouns) rather than grammatical sequences of words, and that when people do enter a
    grammatical sequence, it is usually just a noun phrase, I doubt that you will achieve anything like
    natural language texts. One of the obstacles is that many of the highly frequent words that you get
    in English like 'of', 'the', 'and', etc. are ignored/filtered out by search engines, but are
    essential for producing grammatical sentences. People also naturally tend to search using nouns as
    keywords rather than verbs or adjectives although there is no rational reason for this. For example,
    they will tend to use the noun 'rehearsal' as a keyword rather than the verb 'rehearse'. As a
    result, I predict your texts will end up light in verbs and adjectives as well as having almost no
    grammatical function words. My bet is they will also avoid using words like 'this', 'it',
    'yesterday', 'here', etc. which depend on context for their interpretation so these would also end
    up disappearing from your texts.

    Hope this helps,
    H.

    ---
    Like-minds don't notice shared mistakes. Talk to someone else.
     
  4. Irr

    Irr Guest

    I thought this was really cool idea! Rather than natural language, my bet is you're going to end up
    with a lot of refined lists of frequently associated search terms (and probably quite a bit of NC17-
    rated terms for human anatomy), but an experiment worth running -- and my bookmarking --
    nonetheless. I checked out one of the pages and then searched for a few of the words with google,
    just to see if I could sow my own personal seed, but no hits yet in the google response. Perhaps the
    pages haven't made it into the google cache yet?
     
  5. Tim Tyler <[email protected]> wrote in message news:<[email protected]>...
    > Philipp Lenssen <[email protected]> wrote or quoted:
    >
    > > At http://memecodes.outer-court.com/ :
    >
    > The titles of pages are given prominent weight by search engines - so you may attract more
    > visitors if you make these more interesting.
    >
    > Search engine spiders are irregular beasts. You may find a generation in your system takes a long
    > time - and thus that development is /very/ slow.

    Actually writing my Google Blog for a year I developed an instinct how the Googlebot works, which is
    my main target in this experiment. Also Google is getting really fast. Googlebot recently slurped
    around 20,000 newly-created pages of mine in about a week or two.

    >
    > If your experiment is successful, it may be the cause much frustration among "users" ;-)

    That's one reason why I don't want to put words in the title of the pages as well. I really only
    want to show up in search engines when there's no other, more relevant page.

    It might be your criticism is right, though. Oh well, it's an experiment -- once the idea came to my
    mind I had to try it. Will see how it works. I don't expect the Memecodes system to create
    Shakespeare in a week but I would laugh myself silly if after 6 months it would start creating
    meaningful sentences.
     
  6. [Regarding http://memecodes.outer-court.com/]

    [email protected] (Huck Turner) wrote in message news:<[email protected]>...

    >
    > It's a very interesting idea, but the speed of the evolution might be improved if you used search
    > query data directly without waiting for people to actually find your pages (considering that
    > search engines will only reindex your pages every few weeks).

    I believe I can get Google to reindex sporadically in 2-3 days rythm once the thing is changing at
    that speed. Those are experiences I get from my daily weblog. E.g. check the result searching Google
    for my blog: <http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=blogoscoped> The cache is from
    "25 Feb 2004", a mere two days ago. Googlebot is changing its own speed depending on how fast the
    site updates.

    >
    > Other people have tried to generate natural language texts based on word co-occurrence statistics
    > using text from books, newspapers, etc.
    > . . . Given that most search queries are lists of key words (mostly nouns) rather than grammatical
    > sequences of words, and that when people do enter a grammatical sequence, it is usually just a
    > noun phrase, I doubt that you will achieve anything like natural language texts. One of the
    > obstacles is that many of the highly frequent words that you get in English like 'of', 'the',
    > 'and', etc. are ignored/filtered out by search engines, but are essential for producing
    > grammatical sentences.

    Well, using the Google Web API I created a word-frequency statistic* for around 27,000 words of a
    dictionary. I add a top word -- the, it, there, and, or etc. -- in about 10% of the text. This way
    you can see there are a lot of "the" etc. inside the meme codes -- much more than there would be if
    all I did was pick a random word from the dictionary! Every new meme will then modify 10% of itself
    according to the same rules (which would mean a new 1% of common words).

    *Statistic to view & download at http://blog.outer-court.com/archive/2003_11_03_index.html

    > People also naturally tend to search using nouns as keywords rather than verbs or adjectives
    > although there is no rational reason for this.
    > . . . As a result, I predict your texts will end up light in verbs and adjectives as well as
    > having almost no grammatical function words. My bet is they will also avoid using words like
    > 'this', 'it', 'yesterday', 'here', etc. which depend on context for their interpretation so
    > these would also end up disappearing from your texts.
    >

    OK, so maybe I won't get Shakespeare, but rather something like; "Or free nude britney public domain
    software the remove virus and." You get the point :)

    Thanks for the feedback.
     
  7. "irr" <[email protected]> wrote in message news:<[email protected]>...
    > I thought this was really cool idea! Rather than natural language, my bet is you're going to end
    > up with a lot of refined lists of frequently associated search terms (and probably quite a bit of
    > NC17-rated terms for human anatomy), but an experiment worth running -- and my bookmarking --
    > nonetheless.

    I agree. I run http://www.findforward.com and see that thousands of the queries are about
    this topic!

    > I checked out one of the pages and then searched for a few of the words with google, just to see
    > if I could sow my own personal seed, but no hits yet in the google response. Perhaps the pages
    > haven't made it into the google cache yet?

    You are right -- I will give the Google cache around another week to show my Memecodes. I
    already noticed that a certain bot read all of my Memecodes (I got a hit count of at least 1
    for all of them). As soon as the pages make it into Google and show up the experiment
    officially begins. If you want to hear updates please see my blog -- I'll be watching the
    Memecodes at <http://blog.outer-court.com
     
Loading...