Skip to main content

Google AI research scientist announces Dataset Search

Google, from Day One, got big by getting into the business of finding information. Years later, Google is talking serious business about datasets. Google is launching a new search engine to help scientists find the datasets they need.
 
On Wednesday, Google AI research scientist Natasha Noy announced Google's launch of Dataset Search. You now get easy access to datasets, if you are scientist, or just data "geek" in another type of pursuit, looking for data for your work and for your stories and for your intellectual curiosity.
The goal is to bring you more of a single interface. Jon Fingas in Engadget looked at how it can benefit data searching.
"The tool provides more direct access to data presented in an open standard that makes it clear who created the info, how it was collected and how you're allowed to use it. You could not only track down for a report, but make sure that it's relevant and legal to use."
This is a global (as in international) push that works in multiple languages with support for additional languages coming soon. James Vincent in The Verge quoted Noy: "I do think in the last several years the number of repositories has exploded."
"Simply enter what you are looking for and we will help guide you to the published on the repository provider's site," she said. Currently, datasets and related data tend to be spread across multiple data repositories and one might find that information about these datasets is neither linked nor indexed by engines. For the person doing a search, data discovery becomes tedious at best.
They are seriously into support for an ecosystem where providers of datasets themselves are being encouraged, via guidelines that Google developed, to describe their data "in a way that Google (and other search engines) can better understand the content of their pages," she said.
They used the open standard schema.org for their approach on this. On Noy's' wish list: that all data set providers get behind this common standard. It is hoped that more data repositories will use the schema.org standard to describe their datasets. That way, said Noyes, datasets are part of a "robust ecosystem."
"A search tool like this one is only as good as the metadata that data publishers are willing to provide. We hope to see many of you use the open standards to describe your data, enabling our users to find the data that they are looking for."
Jon Fingas in Engadget: "It's far from a definitive resource at the moment. It's a start, however, and Google is no doubt hoping that this will encourage others to make their public data more searchable."
And if all this were not enough, Google will be cutting some paths in making the most out of data about data about data.
According to The Verge, Jeni Tennison, chief of the Open Data Institute, said ideally Google will publish its own dataset how Dataset Search gets used. She said that Google should publish a dataset about dataset search that would be indexed by Dataset Search, added Vincent. He quoted her:
"Simply understanding how people search is important... what kind of terms they use, how they express them," says Tennison. "If we want to get to grips with how people search for data and make it more accessible, it would be great if Google opened up its own on this." In other words, he added, Google should publish a dataset about dataset search that would be indexed by Dataset Search.
 
More information: www.blog.google/products/searc … r-discover-datasets/
toolbox.google.com/datasetsearch

Comments

Popular posts from this blog

Everything you should know about the coronavirus: COVID-19

  What are coronaviruses? SARS-CoV-2 belongs to a family of single-stranded RNA viruses known as coronaviridae, a common type of virus which affects mammals, birds and reptiles. In humans, it commonly causes mild infections, similar to the common cold, and accounts for 10–30% of upper respiratory tract infections in adults. More serious infections are rare, although coronaviruses can cause enteric and neurological disease. The incubation period of a coronavirus varies but is generally up to two weeks. Previous coronavirus outbreaks include  Middle East respiratory syndrome (MERS ), first reported in Saudi Arabia in September 2012, and severe acute respiratory syndrome (SARS), identified in southern China in 2003. MERS infected around 2,500 people and led to more than 850 deaths while SARS infected more than 8,000 people and resulted in nearly 800 deaths. The case fatality rates for these conditions were 35% and 10%, respectively. SARS-CoV-2 is a new strain of co...

The First Step Towards Responsible AI Needs To Be About People Not Strategy!

Article By Charles Radclyffe:  I was recently consulting for an organisation that was looking to implement a framework to govern the implementation of Artificial Intelligence (AI)  technologies. Like many organisations in their sector, they had been running various ‘lab’ experiments for some time, and had seen positive results; but there was still something holding them back from wholesale investment. A major consulting firm had encouraged them to ‘accelerate’ their innovation by using a framework to govern the roll-out. I asked them where they felt it needed more focus, and they responded saying that it felt somewhat vanilla, a re-hashing of any-old IT project management best practice. “Surely there is something different about AI”, they asked? I couldn’t agree more. There is no magic to AI. Today’s AI is a collection of methodologies that apply extreme reductionism to Big Data in order to elicit patterns, calculate probabilities or make predictions. Wha...

C3 IoT Partners With Google Cloud On AI and IoT

C3 IoT announced on Tuesday a new strategic partnership with Google Cloud Platform (GCP), aimed at accelerating digital transformation with AI and IoT. C3 IoT announced on Tuesday a new strategic partnership with Google Cloud Platform (GCP), aimed at accelerating digital transformation through the use of artificial intelligence (AI) and the Internet of Things (IoT). As part of the announcement, C3 IoT confirmed its IoT platform has been integrated into GCP, leveraging the cloud platform’s infrastructure and AI capabilities. The businesses will work together on marketing, selling, and training initiatives. “The Google Cloud and C3 IoT partnership creates a solution that dramatically speeds up our customers’ digital transformations to allow them to attain new levels of operational efficiency, productivity, and competitive advantage,” said Ed Abbo, C3 IoT President and CTO. “Together, w...