The internet remains massive in scope, but a good portion of it is steadily disappearing, according to new research shared by the Pew Research Center this month that discusses how 38% of webpages that existed in 2013 are no longer accessible in the current year. A line chart, which Pew shared as part of its report, also points to a steady and consistent decline of content from the year 2019.
Highlights from Pew include:
- “A quarter of all webpages that existed at one point between 2013 and 2023 are no longer accessible.”
- “Some 38% of webpages that existed in 2013 are not available today, compared with 8% of pages that existed in 2023.”
- “23% of news webpages contain at least one broken link, as do 21% of webpages from government sites.”
- “54% of Wikipedia pages contain at least one link in their ‘References’ section that points to a page that no longer exists.”
- “Nearly one-in-five tweets are no longer publicly visible on the site just months after being posted.”
- “Certain types of tweets tend to go away more often than others. More than 40% of tweets written in Turkish or Arabic are no longer visible on the site within three months of being posted.”
A look at how content has been disappearing over the last decade:
Pew on how it gathered some of these numbers:
…we collected a random sample of just under 1 million webpages from the archives of Common Crawl, an internet archive service that periodically collects snapshots of the internet as it exists at different points in time. We sampled pages collected by Common Crawl each year from 2013 through 2023 (approximately 90,000 pages per year) and checked to see if those pages still exist today.