RamblingWillie: Week 3: The Importance of Data in Web 2.0

The Web 2.0 pattern for this week is "Data as the New 'Intel Inside'", or the importance of data in web 2.0 applications. This goes beyond the regular information that an application provides for it's users, but also includes the users personal details, habits, likes, dislikes, and friends, and probably many more things besides. It can also include "meta-data", categories, and generalisations. Most importantly, though, the data is linked. Users can know, therefore, what their friends like, or what things (movies, websites, t-shirts) they themselves might be interested in.

It's not merely useful to the user, either. Organisations themselves find a great deal of use for this data. For example, a company may need to know if their products are well received, or weather or not their competitors are having a better time reaching customers, or what advertisements work best on which websites at what times, and much more.

Today, I'll be talking about YouTube, and focusing on not just the videos that people make, but also what people watch, and why, and what people like, and how this information is used to make YouTube a great place to find (and publish) videos of all kinds for viewers of all kinds.

YouTube - Broadcast Yourself

YouTube is a video sharing service, which allows people to upload, view, and share a variety of videos of all types. Dozens of hours of video are uploading to YouTube every minute, and run the gamut from documentary, to comedy, to MIT lectures. All of these videos need to be categorised, and users need to be able to find the types of videos they are looking for. For starters we'll look at how videos are categorised, and then talk about how users (viewers) are associated with categories, and how YouTube then generates recommendations for the user based on this data.

Tags as Categorisation

When a user uploads a video, the user is asked to label it with "tags". These can be any combination of letters or numbers, technically, but are usually used to describe the video in question. A new video review for the movie John Carter, for example, might contain the tags: "review, john carter, movie, disney".

In a stroke, the video has been placed in four categories: videos which are reviews, videos about movies, videos about John Carter, and videos about Disney. All of these categories now "overlap" with this video, and also the system can begin comparing it to other videos that also share these categories. These videos (usually prioritised by popularity) will be displayed next to our John Carter review, and (assuming more reviews of John Carter exist) will include other John Carter reviews.

However, this is only the beginning of the systems data collection on what the user in question may want to watch.

User Views, "Likes", "Dislikes", and Activity

Each time a user views a video, the system takes note of many slight and important details, such as how long the user watched the video for. The use may also "like" or "dislike" the video in question. One might also "favourite" it (bookmarking it for later), or share it with one's friends (on a social networking site, say).

Whenever a user "likes" or "dislikes" a video, the system takes note, and begins to associate the user not just with the video itself, but with the tags the video is categorised with. If a user "liked" our John Carter review, the YouTube system would also (tentatively) associate that user with liking the categories "John Carter", "Movies", "Reviews" and "Disney". It has begun to get a sense of what content you like to see!

The more likes and dislikes a user has, the more precise (and certain) the YouTube system becomes at guesstimating which content the user wants to see. Now, on the YouTube homepage, when the user logs in, they will see a long list of "Recommended Videos" that match what the user likes. If our user, for example, also liked a Lion King tribute video, and an Aladdin parody, the system would become very confident that our user likes Disney films, and would recommend more videos of that nature to the user.

YouTube does more than count Likes and Dislikes, of course. If you favourited a video, then the system will assume that you like it. The same is true if you watched the video all the way through to the end. Conversely, if you closed the video straight away, the system would take that as an indication of dislike or disinterest.

How Data Collection Makes YouTube Better

So we can see that all of the data gathered by YouTube, and the way it is used, makes YouTube a tailor made experience for each user. The more the user uses YouTube, the more precise recommendations become, and the more user friendly YouTube becomes. It becomes much easier to find content that you probably didn't know existed by boldly recommending it to you on the front page.

All of this powerful customisation happens independently of a user subscribing to a channel. Indeed, the channels themselves (that is to say: other users) also have tags associated with them, and can be recommended in the same way.

1 comment:

E GorsuchMarch 19, 2012 at 10:18 PM
YouTube definitely has a huge amount of "data inside". Videos are what everyone go to the site for, and because of the user like/dislike, and tagging, YouTube can target advertising as well. I think YouTube fits well into harnessing collective intelligence (users supply the videos that drive others to YouTube), and data is the next Intel inside (the content/user interaction is the reason YouTube is popular)

Thursday, March 15, 2012

Week 3: The Importance of Data in Web 2.0

1 comment: