[UPDATE: There's an interesting discussion going on in the Etherpad version of this blogpost. Head here if you want to participate. I will try to keep this post in sync with the Etherpad].
This is an early stab at defining a set of problems that have lead me to a rough idea for a new project which I’m calling Hyper Audio.
My background is in journalism and storytelling, mostly radio features and documentary and stories revolving around free culture, the internet, a2k and copyright stuff. I love the radio. I love the internet. I think, when we look back in time 20 or 50 years from now, we will talk about the internet and not radio as the determining media of this era. Existing media —radio, television, print—are increasingly merging with the web (as predicted by McLuhan). In radio, there are already technological advances that fuse FM with the web, such as Radio DNS. So, for me, the distinction between airborne and broadcast-era technologies is unimportant. It will all somehow merge.
Right now there is a split between podcasts, streaming and FM delivery. That will change. And that change should happen in ways that mimic the openededness of the internet and the web. The future of radio is already here. Or more accurately: there are various possible futures, and the playing field is still relatively open.
That said, we are at a point in time when this transition could go in several ways. In Tim Wu‘s terms, we are at a point in the cycle where interneta nd radio can merge in ways that favors open. We can have radio that works more like the web—that’s consumed on your time, that’s connected to web services APIs, that’s hackable, remixable, and fresh. Or we can have something lesser.
This project aims to show why free and open is the better way to go for audio, radio and journalism, and to set the yardsticks first—by tinkering and exploring on the edge of the possible. The term “Hyper Audio” draws inspiration from Tristan Nitot who, after seing the first popcorn.js demo by Brett Gaylor and developers from Seneca College, coined the term“Hyper Video”. Since this is an offspring of Mozilla’s existing HTML5 video efforts, it seemed appropriate to run with it.
What would Hyper Audio look like? What would be better?
- Search! Why has no-one nailed audio search, linking text and audio? Where is Google when you need them? Why are transcripts—when available—decoupled from the audio? Why is it hard or impossible to find stuff that has been said on the radio, on the web? [there are exceptions, like WNYC]
- I want to quote from a podcast or in a radio show and share it with my friends. I don’t want to share the whole thing. Who will give me a way to mark in/out points and share? Imagine sharing and commenting from the timeline, Soundcloud-style. And why not hook that up to the transcript while we’re at it? The lack of search and quotation means that spoken word audio—albeit accessible via the web—is not part of the conversation, that is the web.
- Web players suck, are flash-based (or worse!) and short on social and enhanced features. Only Soundcloud feels (somehow) like 2010, but of course an HMTL5 player with hyper features would be ideal. You cannot move from one player (at home) to another (office, car, bicycle) unless you connect to your (Apple) local device first. You cannot bookmark, save for later, accumulate or store unless using closed platforms. All this could be done with existing technologies and in the browser. Imagine Firefox Sync handling this between your mobile device, your car and your 2-3 computers.
- Audio is a powerful tool for learning while doing something else—driving, for example. Often the removal of visuals and the immersive nature of the listening experience asserts a deeper influence than other media streams. But you are alone and isolated in your listening experience. Why is that? Why can’t I interact with others based on social graph, geo-location, or, say, twitter hastags?
- I hear some cool music in a show. I want to buy it, or just check it out. I can’t. There are some exceptions which are all tedious and tied to one platform and “flat” audio, played locally, in your browser or mobile device. Maybe I want to switch advertising (buy this book on amazon) on/off.
- I want additional info on the book, the person, or the breaking story I am listening to, from the web. What is a credit default swap? I’d like to decide which sources I see; Wikipedia, YouTube related videos, trending on Twitter etc. Is this breaking somewhere else?
We are experimenting with achieving this with Mozilla’s semantic video project, popcorn.js and currently looking into combining this effort with audio.
-Was that edit really fair? I’d like to check the source interview and edits. Did he really say that, or is this taken so much out of context? Where’s the original file? Has this interview been performer or after a certain time or event? Are there more recent relevant source interviews?
-Subtitles, translation and versioning never happens. This means most content from outside your language area is invisible to you. I did most of my interviews for DR in english. I have hundreds of great interviews with key people on the web in a drawer. Few minutes of each were used, and since I had to translate it, I could only use snippets. This sucks! There’s so much value in there, so much transaction cost. Could Universal Subtitles be applied here?
- Every interviewee in the news gets interviewed by every different news outlet to get the essentially same soundbite. That’s a lot of friction. Why not share and provide a version history if the story evolves?
- Insider a broadcaster you can access each others editing sessions. Why not facilitate the same, so you can learn and build on other people´s work across broadcasters?
There are a lot of ideas here. In discussing these with various people—Mozilla’s Audio API team, people at Seneca College, Soundcloud, the team behind Hindenburg (see this), and not least, radio journalists, we all recognize a need to make radio more like the web.
In the short term, I’ll be collaborating with various parties on demos that combine immersive storytelling and journalistic experimentation with cutting-edge technologies that Mozilla are involved in, specifically HTML5 <audio>. But, I am dead-set on focusing on content, story and why instead of how. We’re going to be much more than techno-porn: I want to see a new and immersive radio listening experience built on open coming out of this experimentation. One that makes sense for everyone who loves radio and the web.
After the first demos I hope that we’re at a point where a first (hopefully several) “serious” prototypes are lined up; partnerships are made, and a focus development is under way. And, ultimately, I hope we can set a standard for inserting semantic data into audio, and one that is ready for adoption by major and mainstream players.











My Thoughts:
Thought I would chime in and record some of the key points from our previous discussion here.
Demos are great as a proof on concept and a perfect way to rapidly prototype and test out a desired specification and generate interest but the goal here should be defining a new web standard or looking into ways existing web standards can fill these gaps.
I talked with Doug Schepers of the W3C and he pointed me to two existing but largly unknown specs which might work: Media Annotations and Media Fragments.
Media Fragments http://www.w3.org/2008/WebVideo/Fragments/WD-media-fragments-spec/
“This document describes the Media Fragments 1.0 specification. It specifies the syntax for constructing media fragment URIs and explains how to handle them when used over the HTTP protocol. The syntax is based on the specification of particular name-value pairs that can be used in URI fragment and URI query requests to restrict a media resource to a certain fragment.”
Media Annotations http://www.w3.org/2008/01/media-annotations-wg.html
“The mission of the Media Annotations Working Group, part of the Video in the Web Activity, is to provide an ontology designed to facilitate cross-community data integration of information related to media objects in the Web, such as video, audio and images.”
I took a cursory glance over these documents and it seems to me that media fragments describes a transport for slicing audio and video and receiving only those specific peices from the server without downloading the entire media document. Thishis would facilitate the audio “quotes” functionality mentioned aboved. I beleive audio quotations would be a very valuable feature to present to users and can be handled easily at the moment through front end applications in js or flash but with the disadvantage of having to stream buffer the entire file in order to play a specify “quote” (snippet/fragment), the harder problem is getting the server side to play nice and only stream the specific time fragment which is what i beleive Media Fragment spec addresses.
The second problem mentioned above is making media searchable, a format for timecoded audio transcription is nessessary. The anotation format must be multi dimentional to handle subtiled for instance, in multiple languages. as well as other meta annotations like links, twitter account, related wikipedia article, photos, google maps, etc. If all of this information is avaialbe in a standard digest, search engines like google will know where to look to start indexing audio/video transcriptions and including in results.
That is all for now. Please let me know what you think.
In the old days, a very long time ago, AltaVista offered an interesting projekt with a bunch of U.S. radio station on search in radioprogramming content. As a radio reporter I managed to find documentation (snippets) this way.
It was quite fast, but naturally had limitations. It would find things people said inside an mp3 audio file based on speech recognition. It was far from perfect, audio quality not very good mostly, but it would zoom you into the wanted place inside a radio show. The engine would lead you into the right place in the audio file.
In those days AltaVista was the Google of its time and a true wonder in search (now it is totally Yahoo). It was developed by Digital Equipment Corporation. However the audio content search surfaced (it was never promoted), when Compaq bought DEC i 1999. The project came from Compaq labs.
I have never met anyone else familiar with this research project, but it worked fine even with the limitations of a few hundred radio stations. Nowadays AltaVista audio search is exclusively in the titles of files.
If you didn’t know about this, here’s the data … I have not been able to indentify further information about this having consulted Google, AltaVista and wikipedia on the various corporations involved.
But I’m sure you get the point … it worked.
I think for media annotations of any kind to be truly successful you will need to build the right tools for creating the annotation data.
To find a way for media content providers to capture the information quickly and accurately is going to be the biggest challenge. But it’s an easier task than annotating our current mass of historical media.
Going back through history and annotating all our media would be too great a task, especially with the complex copyright issues we have now, and that’s before you begin to consider the complexities of voice recognition from such a variable source.
I guess what I mean to say is: if we *could* annotate all the media from past history… we should! But if that is too unfathomable a task, we should focus on creating the tools to capture annotation data now, in a fast, intuitive way.
I would love to see what the open source community would do if we could read the source of a Microphone input in JavaScript.
View-source voice-recognition would be a huge step forward for annotating things on the web. I mean… this is the “Web”… imaging the richness in content discovery if media-annotation was a) a part of the usual production process for content creators and b) an integral part of the typical social media experience.
Henrik you touch on subjects that I have often thought about and many more which I haven’t which made for an interesting read. I like the sound of hyper audio, props to Tristan on that one.
I think a lot of what you describe can be done now. HTML5 makes it easier than ever to connect media to the web. However it’s easy to forget that not all browsers support HTML5 media, so to avoid leaving older browser users behind we need to create a fallback that gives a similar experience regardless of browser and this is the challenge we have undertaken with jPlayer an open source media library for jQuery.
Undoubtedly once Media Fragments and Media Annotations gain a critical mass of adoption it will unleash a host of possibilities and allow much better experiences. I think Corban has it right when he describes Media Fragments as mechanism to allow a specific slice of media to be downloaded. This is to be welcomed and will greatly increase efficiency not to mention facilitate a cool new way to make mashups, but to a large extent we can do this anyway in an inefficient manner, we just need to download the whole media file first. I believe YouTube allows this now, for example. Similarly with Media Annotations, I think a common format for storing further information about media would be very useful indeed.
I worked with others at the Drumbeat festival in Barcelona to expose basic metadata using JavaScript, RDFa and the established Dublin Core Metadata Element Set so I know that at least in part, progress in this area is possible. The issue, I suppose, is to ensure that the information is packaged with the media and doesn’t get lost along the way, but this is really up to the web author to decide when they remix / embed or link to it. Perhaps if authors could be persuaded of the benefits of keeping metadata intact (better search engine rankings perhaps or maybe just more visits) we could move forward.
I’m a big fan of the popcorn.js library, which essentially allows you to tightly couple media with other parts of the web experience. I created a simple demo that crosses over to a small extent with what popcorn.js does, but just allows you to synchronise and link text to the appropriate parts of an audio file. I think Universal Subtitles are aiming to do a similar thing. In theory then, you could highlight a piece of synchronised text and get a link to the relevant start and end points of the media.
Regarding media and specifically audio going forward, there was an interesting article published yesterday in The New York Times, Radio for the YouTube Era? There certainly seems to be some interesting startups working in the audio area. One of my favourites is said.fm that are curators of audio from around the web, I think there’s definitely a role for good curators considering the wealth of media that is out there.
The biggest challenge I see to a free and open media-verse (yes I just made that up – I’m getting to the end of my beer) is the issue of copyrighted media, nobody really knows how this is going to play out but I suspect the sheer volume of quality open material will eventually outweigh the copyrighted.
I just wanted to clarify a couple of point in the my above post.
To say that with HTML5 media we need to download the whole file before playing it is not strictly true. Most HTML5 capable browsers do not have this requirement and allow you to ‘seek’ to a certain part of the file without having to download the part of the file that precedes it.
I mentioned that YouTube allows you to specify a set part of a file to be played, this is not strictly true either, it allows you to specify a start point by appending #t=1m09 to the url (for example). Third-party services such as http://www.splicd.com/ and http://www.tubechop.com/ allow you to set both an a start and end point – but not if the file’s embedding has been disabled.
So much to discuss here. For now I just want to show you this: http://www.3playmedia.com/interactive/video-clipping/#p3s:50350&p3e:76040&p3v:Z1-LPzMpElU – could this be coded open web style, adding our other layers (popcorn etc)?
[...] to be much more, it can be a driver of much richer interactions, something Henrik Moltke has dubbed Hyper Audio. The remit of the project was to take various media elements of a radio interview broadcast by [...]
Hi Henrik,
thank you for your very interesting future visions…
I am just testing automated speechanalysis software (german, italien, english and spanish language models)to set up online-services for video-seo and live-subtitling (the results are not so bad…).
Of cause today we are not reaching the accuracy of manualy trancripted audios – but we are catching up and we often hear people say – “good enough…”.
From my point of view a combination of different transcripton techniques would make sens to push Hyper Audio…
Have a great time!
Jasper
[...] are part of a broader concept dubbed Hyperaudio by Henrik Moltke, they are transcripts hyper-linked to the media they represent, a type of [...]
[...] few months later I was introduced to and became involved with something known as Hyperaudio and was given the opportunity to create a couple of proof-of-concept demos : Denmark Radio’s [...]
[...] with synchronizing text and audio in HTML5 about a year ago. A few months later, he stumbled across a blog post by Henrik Molte about the use of open web technologies to advance radio, an idea Molte coined Hyperaudio. The two [...]