This is Misinfo Weekly, a somewhat weekly program about misinformation in our time. Misinfo Weekly is made by the Unit for Data Science and Analytics at Arizona State University Library.


Hello, today is February 5, 2021. And we have a guest today, Dr. Jessica Ogden. She's a senior research associate at the University of Bristol, an ESRC postdoctoral fellow and a fellow at the Bristol digital futures initiative. Hi, Jessica, how are you today?


Hi, Sean, I'm doing fine. Thanks.


We're super excited to have you. So today we're going to talk a bit about web archives. What web archives are, how they're created. Lots of issues around that as well as, you know, misinformation in their connection to web archives. I know we've talked a little bit in previous episodes about web archives, and sort of mentioned them a little bit. But Jessica, what is a web archive?


Yeah, that's probably a great, great place to start. Maybe if we back up just a little bit and think about why web archives are a thing before we sort of discuss what they are. And I guess I would start with the fact that the web is really ephemeral. So from the kind of beginnings of the web, we know that there's no kind of inbuilt preservation mechanism. So things can be put online and taken offline for lots of different reasons, some purposeful and some not. So sometimes things just become obsolescent because your servers fall down, or somebody doesn't renew their domain name, or sometimes they're deliberately removed. And we see that across, at least in the present climate, across different social media platforms where either algorithms or moderators remove things from online or indeed individuals themselves. You know, if you put something online that you think "I actually don't want that online anymore," you end up deleting it. And so kind of ephemerality manifests in lots of different ways online, and and hence why sort of different web archiving initiatives have come in to start archiving. Well, not to start they've been archiving the web for, for quite a long time, actually.


I think it might be easy to think that the internet is kind of lossless format, in the sense that if you're a consumer, and you're using some high profile services, or really well established services, sometimes it can feel like you can go all the way back to the beginning of whatever kind of content that you want. And it is still possible to find old webpages. A starting place. We're talking about web archiving and misinformation. Is this idea about how much of the web just evaporates?


Yeah, that's a really, really good question. And as you were talking, I was thinking about the ways that certain platforms reinforce this idea that your data and think there are there for all time. And the example can be, you know, how Facebook shows you the things that you posted 10 years ago, or, or whatever. And it almost falsely reinforces that notion that your posts will be there forever. But to answer your question about the longitudinal or the kind of lifespan of stuff online, I think, you know, most people refer to a study, which is actually quite old now. So sort of from the mid 2000s, which I think documented that the average lifespan of certain kinds of social media content at that time, was around 100 days, which is actually really not that long. And I think they're definitely, you know, some other studies that are probably more up to date than that one now. But I would be curious to see how that kind of plays because it plays differently on different, different platforms in different, different domains. Of course, that's always the bookmark that people sort of go to when they're trying to explain how much the web actually isn't online for that long.


Right? So it's, it's not like, the phenomenon is like, so big and dispersed. It's not like we can put a percentage on, you know, how much attrition we experience. But that lifespan figure where you can actually clock the lifespan of a given item. And it's lives less than, say, a set of tires on a car caused us to reevaluate when people say, "Well, why do we need libraries anymore? We have Google."


I just screamed in my head a little bit when you said that, why do we need libraries anymore? Ah, we need libraries. So a question, Jess, would be then. So who are the the sort of players in these spaces? Because most people think like, oh, like click on links in Google, so Google must be the one that's doing all of this. But you know, who's really doing this?


Yeah, that's a great question, I think more or less as a sort of extension of the digital preservation movement of the kind of late 80s and early 90s, the sort of what I would classify as sort of conventional memory institutions kind of came into this space, when they started thinking about what's called born digital information and media, and extended that into the web. So they started thinking about, first on the internet. And then of course, with the arrival of the web in the early 1990s. Thinking about well, how we need to be capturing this. This is the kind of new mode of communication. This is where history is taking place, as it were. So we need to have a mechanism for preserving that through time. And so you get institutions mostly to certain national libraries and archives that start playing in this field. Developing, and actually kind of reusing the tools around web scraping to develop, you know, certain kinds of standards and extended tools to do this and to kind of develop a sort of field of practice around archival standards and things for, for the web. And one of the big players that also emerged in the late 90s was the Internet Archive, which I would assume at least some percentage of your listeners will have heard of, they're kind of the probably the biggest and the kind of most famous web archive, I would argue. Which is manifested through things like the Wayback Machine where you can go


 Synonymous with the Wayback Machine. Yeah.


Yep. So that you can go there, and you can visit because they've been archiving the web for a really long time. You know, there's a lot of content there where you can go and view the web of the past there. But just to sort of, to round it off. And this is kind of where some of my research has been focused, as well as thinking about some of the other sort of disruptors in the field. So beyond libraries and archives, and the Internet Archive is definitely, I would classify as a disruptor. Within libraries and archives as well, which we could unpack it, if you're interested. There's also community groups, hacker organizations, volunteer collectives that get together online and archive the web in various ways and formats and have different kind of interests in different parts of the web. And a lot of my research is kind of centered on that. But, so yeah, just to say there's lots of different players out there archiving the web for different reasons, of course, industry as well not to forget various companies and corporations who are out there archiving the web for their own purposes, too.


So rather than think about the internet as something that you record, like recording a TV show, and you want to just make sure that someone's recording at all. There's multiple parties with different interests, and different philosophies, who are recording can't record the entire internet. So recording different fractions of the Internet, and all of those assembled together are a kind of aggregate archive within they have their own individual archives as well.


Yeah, that's right. It's a really, really important point too, because the the sort of in this, again, some of my research is around this kind of selectivity of web archiving, and how that's determined by who's doing it and and what they want to archive really. And that's kind of basic premise of, you know, archival theory and library practices that, you know, selection decisions need to be made. But it's a kind of. It manifests, really interestingly on the web, because you get these national web archives who are only interested in archiving the UK web, for example. But what is the UK web? Most people don't think about the web in terms of their own country's domain, they think about the web, is this kind of internet connected international thing. And when we start thinking about it in terms of information, ecosystems, and issues around misinformation, and other ways in which things travel online, we think about them as interconnected. We don't think about them as...as country specific. And so I think they it, raises really interesting questions and problems around both how you track those different forms of information that you might be interested in tracking as a scholar or researcher, but also how you record them for posterity for different kinds of uses.


 Sohistory is written by the person with the most capable servers.


[Laughing] Yeah, yeah. 


The biggest hard drive wins, basically.


The most bandwidth. Yeah. That's right.


Well, there's also a time component correct? Because when we talk about ephemerality, ephemerality is not just things disappearing ephemerality also is that the web in content on the web is constantly changing. And so what role does time play? Because it's not like we can just archive a website once in 1998. And then like, poof, we're done. We have it forever.


Yeah, that's a really, really good point, Shawn. And I think it speaks to some of the work. So one of the case studies that I did, as part of my PhD research was around looking at how activists came together after the election of Donald Trump to archive climate science data. And so what they tried to do was build a series of tools around how do you monitor changes online in specific contexts? So they were really interested in how do you monitor the government's activities and information around the climate science agenda, because they were, you know, they have the real big expectations about them being kind of anti climate science, and that they would probably go online and remove all of the access to open government data, as well as you know, educational resources around climate science. And so they were really concerned about how you prove that. So if you only took a snapshot here and there, it'd be really difficult to know when those things disappeared or when they were removed, and then infer some kind of intentionality around that. And so they had to archive quite a lot and and then develop tools to to what was called diffing tools to look at the changes on those pages, to see how the rhetoric around the climate science data was changing over time. And then they would issue these kind of rapid fire policy briefs and documents around what was happening there. And so, you can see as just one example, you can see how those that temporality component you're asking about Shawn is really important to to how and what you can say about web archives. What you can say using them as resources, but it also becomes a really powerful tool for for monitoring change over time as well. Which brings with it you know, so kinds of risks depending on who you're monitoring, I guess.


So I think that brings up a good point you've been saying about this idea of selection. There's also sort of this technical process of how does something get web archived? So let's, can we break those two things down? So you'd be first, can you sort of talk about, so you have a webpage, a URL, for example, that you want to archive? What sort of an overview of like the flow that might be helpful for the public to think because it's not like you have someone at, say, the Internet Archive, it's not like you submit a URL in the Internet Archive. And then like a pigeon types on a keyboard, and says goes into web browsers like File, Save, and then they drag that up to Internet Archive Server somewhere, right. So how does this magic work? 


Well, yeah, we actually that's, that's awesome, right? We don't we don't want anyone to feel like it's magic. So even in the way we've been talking about it, when you say these groups, they just archive these pages. What does that, what does that mean? Right, like a golden rule is, anytime there's a complex technical process, people are likely to assume it's magical and takes a lot less time than it actually does. So what's going on? 


And probably that it works a lot better than it actually does, too.


[Laughing] Yeah. Yeah.  Yeah, absolutely. I think those are all really good and important points. And I think it was actually one of the starting points for my research is because there's a lot of rhetoric, I guess, you could say around around web archives, and, and well, all technologies, as you're kind of alluding to Michael around sort of black boxing, what these technologies are actually doing. And often when you have some level of automation, or semi automation in those processes, that becomes even further blackbox. And it becomes this kind of magical tool to all of your needs for looking at the past web. And I think, in the case of web archives, in terms of how they work, generally speaking, there are, are now there've been a kind of number of standards, in terms of formats and sort of what we could call best practices, I guess, around creating archival versions of the web. So when we say archival versions, or when I say it, rather, I take a really broad view on that. But I also, I tried to sort of couch it in terms of sort of preservation standards, I guess, because, you know, you could think about how if you went into your normal browser, you could just, you know, download a zip drive of a webpage. And that could maybe be considered an archive, which of course, it is a form of archives. But in this sense, the tools are really built to build on standards that are set by libraries and archives, so that they can be kind of preserved against various obsolescence in terms of technologies over time, and you can deposit them in these preservation repositories for for for so called, you know, all time which we could, we could unpack, but it's probably best to leave it there. Sorry, I'm rambling a little bit. But I think in terms of the technologies, it they're based on web crawlers, so sort of standard standard web crawlers where you can kind of semi automate a command line tool, for example, to use HTTP protocol to go to a website and essentially scrape everything there and put it into this this archival format, which then preserves all of the headers and things associated with that page. When you made the call. And there are a whole series of sort of metadata associated with with that. But there are problems with that approach. And Shawn, you know, a lot about these, these problems are associated with the quality of these archives often.


Alright, so I guess, if we talk briefly about sort of a webpage, right, it's it's a complicated beast, not magical and mythical. So I'll stay away from magic, Michael. But it's, it's a complicated beast. It's not just text, right? So web page includes what images, some web pages are very dynamic. And that that there's, there's JavaScript or code running in the browser to present things to the user based on their interaction. So how, how does that cause issues or not cause issues with web archives?


Yeah, well, I think that's our rule of thumb is, the more dynamic a page is, the more difficult it is to, to archive it and create a so-called sort of authentic representation of what that web page looked like. And I think, as you alluded to the JavaScript is like a major, major problem for most archives. Although I should say that some of the tools, you know, there's been a lot of sort of rapid development in these tools, especially over the last few years, which is improving the quality of these representations. But, I think one additional contextual thing to throw in is that, of course, we all interact with the web in different ways. Not everybody sitting at a Chrome browser on a desktop viewing the web. We, we view the web in lots of different ways on mobile applications on different operating systems on different types of browsers. And we experience the web in different ways. And web archives can't really capture all those different elements, that kind of contextual stuff that happens around, around the web is also there. You know, there often is singular representation of something that the computer sees that the command line sees that they can interpret through the protocol, not necessarily through what you might see as a human being on, on the screen when you interact with the web.


Yeah, that's an interesting point about what you're actually archiving. Because it's not like anyone has access to the server, and can get that level of fidelity of whatever is on there. All of that I think to lay people is going to sound painstaking and exhausting. And I bet that people are thinking, "I'm glad someone is doing this. Oh, that the internet is incredibly lossy, and ephemeral? Well, I'm glad someone's doing it." Right. But I think it's important to highlight exactly how much hard work this is. And it's not like we're making...there's a new TV drama about web archiving that came out, right. It's not like the kind of work that tends to be glamorized. And so there is so much consequential stuff going on in the world of web archiving. But it's something that I think people kind of repress in their heads as something that absolutely has to happen. And it's very complex.


Yeah, I think it's interesting, I think it makes me think of, you know, there's been a lot of research recently around this kind of what what could be called as one of the maintenance work of data work. The maintenance and repair work that goes around information work, as well as work with data, and the kind of how that's reliant on these kind of large scale infrastructure information infrastructures, that that underlie the web, but also sort of sit on top of it to, to create these kind of levels of preservation. I think it's also important for listeners to also see the other side, which, you know, I'm a great champion of libraries and archives. And I think they're absolutely doing, you know, the work that we need them to do in this, especially in this kind of current, current moment, and the kind of current content...contemporary climate that we live in politics and all the rest. But I do think that there are some really big questions not just about what's happening in libraries and archives, but in these other more kind of collectives and community groups that are archiving as well as in the kind of corporate and platform level, what's happening. Is that web archiving isn't always an inherent good, I guess, would be the kind of question I you know, is it always inherently a good thing? You know, there are some really some risks that come with archiving the web, especially at scale, both for individuals who may not want some of their the things that they put online to be archived for all time in the Internet Archive. And both for kind of specific groups and populations for which the risks are even higher. And I think we see that within some of the work that's been going on around protest movements, and how these web archives can be used as surveillance tools in certain communities to identify people who are exercising, you know, their their rights, essentially, to protest and to free speech. Now, of course, I think this is when it gets into really interesting territory is when we start thinking about what happened, the capital and the insurrection, and how similar tools are now being used to identify what what happened there. And I think there's a whole host of issues to be unpacked there. But I guess I just want to sort of posit to the listeners and to you all, that, that web archives aren't necessarily always a good thing, potentially, you know, as complex as the processes that under underlie the collection of web archives, I think we should equally treat the use of those web archives with with equal sort of critical scrutiny as well. 


I mean, we also know that some folks use web archives for devious purposes or to basically keep content online that might be problematic, like, we know white supremacist groups, for example, create web pages, then ask the Internet Archive to save them. And then their website, of course, is taken down because it's hate speech. But the Internet Archive copy still lives on. And they circulate that. So it's also complicated, right in what's you know, how we choose what to keep, even after we've collected it. Right?


Yeah, that's right. And I think, you know, there's a real, and this is sort of where my attentions have been turned recently, with, with what's happening on Parler, but I think elsewhere, around, you know, content, moderation doesn't stop at the act of removing something from a platform. And I think we really need to be looking at how these different web archives and it's not just the Internet Archive, it's also other open source and open access web archives that are being used across different platforms to circulate archival links that are no longer available online, which have been removed because they're deemed hate speech or, or whatever else. And I think there's some, Yeah, some really open questions about the sort of social processes that surround how these links and how archives are used, despite, you know, these kind of high level content moderation policies and algorithms that are being developed to remove things. How do we start understanding both how they're being used on different platforms, but then how we go about either intervening or mitigating that use in certain certain circumstances? Because it becomes extremely complex when you're dealing with web archives, they are  meant to be there. You know, if we believe the, the internet archives mission is to keep keep these archives online forever. So you know, and that's not to say, of course they don't they do. They do have so called dark archives, they do remove things, you know, such as child exploitation images and things of that nature. And there were some big cases around images circulating around terrorism and live streams associated with that which were archived and removed. But I think it becomes, the picture becomes more complex when you insert these web archives into that, that ecosystem.


Yeah so it sounds like going on all the time, is contestation for what the history of the internet looks like and will look like. And events like the storming of the Capitol events, like when entire apps are taken offline, and events like having entire communities be de platformed put a lot of pressure on the folks who are doing this kind of work. And puts a lot of pressure on us collectively, in terms of the kind of history we're trying to stitch together about what is and was online.


Yeah, that's that's right. I think I would love to, to speak with someone at, at the Internet Archive in some of these other organizations to understand. It does seem like I think maybe Shawn, you said this earlier, but it does seem that web archives are coming more into the public consciousness, I guess, to a certain extent, but I think maybe that recognition of both the need and the work that's that's happening, there is, I think, I think rising to the fore given these issues, and given especially the events of the last, it feels like forever, but for the last couple of months, I would say have really brought those issues to the fore. You know, I don't know if it's worth talking about some of those, those activities as well, in terms of how, you know how archivists were intervening collectively on Parler, but also some of the other activities around misinformation and the insurrection.


Yeah, I mean, I think this is a great time to wheel the conversation towards misinformation. I don't think it's obvious why web archiving matters so much to not just the study of misinformation, but also any attempts to kind of practically counteract misinformation. So even how to understand and behave around misinformation, web archiving matters to that effort. Why? What are some of the kind of like high level reasons that you think web archiving applies so squarely to folks interested in misinformation? 


Yeah, I mean, I think you sort of touched on it. And this is the most obvious reason, I guess, is if you're studying misinformation, or you're studying communities who are propagating misinformation, you know, web archives, arguably are a, you know, a key resource and tool, or at least they could be and and arguably, should be in the study of misinformation. Because, you know, if part of the interventions or at least some of the advocacy around interventions in misinformation around sort of deplatforming are being advocated, then we really need to still, if you're going to study those those communities in some way, you need to then have an archive of what's happening there if you're also arguing for deplatforming of certain communities. And so there's a tension there, I think between between the interventions, potentially, if you don't do the archiving before, of course, they're removed from online because often you don't have the sort of source community that you're aiming to study. But I think, additionally, and I think this is kind of, there's been some recent work that's emerged around this. Around, you know, how web archives are actually also being used by people circulating misinformation, mis and disinformation, as a tool to circumvent a lot of these interventions. And I think it becomes very meta when you start talking about web archives, because you also need web archives to study how archives are being used in those contexts. Because if you haven't captured that in some way, you know that that content is often really ephemeral. Then you often don't know how they're being used. And so the argument usually comes down to well, we should archive all the things. Which is, you know, an easy argument to make, but, but not terribly realistic. Because, you know, the web is a really, really big place. And, you know, decisions have to be made about what to collect and what to archive. So I think, you know, a lot of libraries and archives and community groups are really advocating for specialists and experts and researchers, especially in this space to really start intervening to to assist with the kind of creation of these collections and target some of that collection so that we have those as research collections, as well.


So,I think maybe to make those examples, somewhat concrete, this idea of, you know, how archives maybe being used to subvert other things. We can think of, you know, one example of the role of the web archives is, you know, former President Donald Trump's Twitter account has been suspended so those tweets are no longer accessible on twitter.com. But they are somewhat accessible via the Internet Archive, and other web archives. Or we can think of QAnon and often uses direct links into the Internet Archive and archive.is other web archives to refer to, say, news articles at specific moments in time before they've been updated to kind of basically to prove their argument or also to circumvent, say, labels that might be put on content saying like, this is misinformation. This is problematic. They then sort of, as Michael was saying, to kind of circumvent that whole process by linking into the archives. So then they preserve their argument at a very specific moment in time, and can pick a copy of content that says kind of what they wanted to say, and presents like a specific version of reality. Are there other specific examples you were thinking of? Whenever you mentioned that?


Yeah, no, those, those are really good examples. There's also been some, you know, some things emerging in our own work around you know, how this is a kind of hot off the press. So I'm not sure how much I can actually say about this, but that there's a kind of social practice around sharing archived links as a sort of precursor to the expectation that a lot of that content will be removed. And I think that exists again, you know, that that's a thing that we don't really fully understand. And I think it kind of speaks to other practices within other kind of hacker subcultures as well around self archiving. So this is a thing for people who sort of study these these communities, there's a real push to do a lot of self archiving as a form of kind of creating your own history or the history of your own community by, by archiving it in real time. And often that manifests itself in different subreddits and different QAnon communities where they're doing it as a sort of reflective social practice, where they're creating those archives, as well as sharing that within their own profiles. So you know, your top 10 favorite, QAnon theories are already archived, and they're here, and they're linked in my profile as just one kind of small example. But I think there's it's part of a larger social practice, I would argue online, where these archives are kind of, they form a sort of central component to how these communities, you know, see information and share information and create specific kinds of community identity around that information sharing. And I think, you know, QAnon is probably the "best" contemporary (the "best" is in air quotes), contemporary example of how powerful that really is, for a community sort of sense of Yeah, the role of information and how that creates social identity and, and the sharing of that information.


Yeah, I mean, this is this reminds me of some things that we saw when we looked at the Twitter, misinformation about the wildfires this summer. When people were alleging that, Antifa was starting to wildfires. And one thing that we did see is people sharing personal archives of newspaper articles that were still online. And this practice that you're speaking out is, is very interesting, right? It can be easy to say, "Oh, well, that on the one hand archiving is the side of light. And misinformation is the side of dark. And that's just how it goes." But as soon as I archive something, I am signaling some kind of preemption. And preemption is a form of conflict. And so if I am going to say I am taking a pre-emptive strategy towards this information, than it already means I'm anticipating some kind of attack. And so these archives have the flavor sometimes at least I don't have a ton of exposure to them, but just looking through the ones that had to do with Twitter have with these Antifa fires, alleged Antifa fires, sorry. You know, it had a lot of the flavor of the red yarn board of the conspiracy theorists.  [It’s Always Sunny in Philadelphia sound bite].... That these personal archives are kind of like the equivalent but it shows i think that you know, the practice of archiving not just the archive itself, can actually further somebody being misinformed or disinforming person. Right. it helps underscore that misinformation is more than just the content, the situation you create around the information can make it feel more devious if you're trying to spin a conspiracy theory. And so putting everything in an archive and handing it to somebody and saying, "Hey, I had to archive this because it's really dangerous" positions that content in a different way than if you were to just link it on NBC news.com. That part's really interesting. Shawn, this is also this part of the conversation reminds me of some of your interest in ephemerality and misinformation, and how important it is for those doing misinformation campaigns, either knowingly or unknowingly. How ephemerality is an important tool for them.


Right and some of our work that we've collaborated on and also with Dr. Marco Bastos, at University College, Dublin, you know, we've seen that in some misinformation campaigns within about 48 hours, a large percentage of those links disappear in campaigns. So content appears, we see multiple copies of an article kind of saying the same thing, but they look like they're from different legitimate news outlets, and then all that disappears within 48 hours. And so that means it's very difficult for us to go back and have a discussion about the record. We just have kind of what we remember, versus the actual record. And I think that could also be a potentially do you think Jessica, that'd be a challenge for web archives is content that's not sticking around? I mean, so we can think of like, the White House website, or our university's website, you know, ASU's website, there are lots of archives of that. And if that's not going anywhere, that website just, it's always there. Every time the crawler says hello, ASU gives its new, innovative website to the crawler. But what happens with sort of content that's kind of appears and then sticks around for a couple minutes and then disappears, does, but how do things actually get into the archive? You know, we have one way we talked about people create their personal archives. But that's that's not the only way. And I imagine these archives aren't omnipotent that they just know every webpage as soon as it pops up. How does this work?


Yeah, well, it kind of depends on where you look. So there are lots of different mechanisms. And maybe I could speak to one group. So one of the groups that I, I  also studied as part of my ,my PhD research is called Archive Team. And they were, they're sort of a collective, a self described, loose collective of, of hackers and archivists and librarians, and hobbyists, and writers and so called loud mouths, who go out and archive the web, but they do it in lots of different ways. And so part of my study was to try and understand what those different mechanisms as you're kind of alluding to that the actually, you know, the selection practices about how you, how you actually monitor and collect these different sites. And they do it in lots of different ways. And they're really creative about it several ways I have to say, so what you know, one way they do it, which doesn't speak to your ephemerality question, which I can come back to, but they use different pages on Wikipedia. So they'll have various bots that monitor the addition of links on Wikipedia, especially around people and events. And use those kind of robots who are sitting there waiting for new links to be added, as cues to then send those links to the the infrastructure that they've built, to then go out, send the crawler out, archive the site, and put it through this kind of complicated pipeline that then packages it up in different ways and creates all the metadata around it and then deposit it in the in the Internet Archive. They also have other other tools for you, there are communities so they they sit on a sort of internet Relay Chat Room, and people come in and suggest like, "Hey, I heard this platform is going down, you know, you might want to go and archive it." They send requests to the, to the robots, but they also have other, you know, mechanisms in terms of social media and other bullhorns to mobilize collectives of people to connect to the tools that they have, in order to essentially crowdsource what's being archived. And often that's driven by, you know, major platforms going offline. So when, when Vine announced that it was going offline, or Google Plus, we can remember Google Plus, they they archived GeoCities way back in the day back in 2009, as well, and and again, a lot they mobilized again, for Parler a few weeks ago. That's just to say that a lot of the things that they select to archive are driven by the idea that all things online are created equal. So they try, they at least espouse to not to not select what they think is important, or what they think is politically salient in that moment. But but they, they try, at least to, to kind of demonstrate that all things should be archived. And part of my research was unpacking how that actually manifested. And it turns out, you always make selection decisions, because, you know, we said you, you can't, you can't archive it all. And, and often that manifests itself in sort of, what parts of the platform do you archive? You know, when they were archiving Tumblr, you know, do you collect the notes and the comments that are attached to posts or you do just click the post? Well, it turns out, we only have three days to archive this before Tumblr is going to remove, you know, this whole community. So, you know, it's, it's not just about what you think is valuable, but it's often also a relationship between that and the time available before you know something is going to be going to be removed. Either by the platform themselves or by other kind of forms of content moderation. So it's just kind of complex, I guess.


Yeah, it sounds like a complex organizational and technological process. And it sounds like it's very likely that if I post some content on a platform, and then I delete it, or my account goes away, unless I have access to the servers of the service itself that might be holding on to my deleted post. There is no Internet Archive of what I've done. If I, if I tweet something, and it misleads 100 people, and then I delete that tweet, finding my post, again, if I've deleted it within 24 hours, and somebody doesn't say, "Hey, we need to archive that tweet," it's just gone.


Yeah, I think that's I mean, I think for most people, I think that's definitely the case. And I think it's also worth saying that it's also platform specific. So I think Facebook is, is notoriously under archived. And part of this has to do with the, the technical complexities around archiving Facebook. But also because it is a, as we know, as sort of walled garden and does everything possible to sort of keep, keep people from archiving it as a so-called public platform. Because it's, you know, at least in its early days was supposed to be based on friendship and networks. And only, you know, you only show yourself to people that, you know, we know that there's lots of holes within that rhetoric, rhetoric and how it's kind of technically manifested these days. But, But nevertheless, Facebook is very under archived and doesn't visit necessarily exists on in, in most web archives. Which, again, proves really big challenges for how we understand what's going on in some of these communities, and the propagation of information. And back to Shawn, your question about sort of other examples of what's happening, you know, the there's been some work recently around COVID-19, and the vaccine rollout and how web archives are being used to, again to circulate misinformation around the vaccine roll out, especially on Facebook, I've seen. And, you know, it becomes really difficult to understand exactly what's happening there. Unless you're archiving it, you know, there's, I think, still some really big challenges there for for misinfo researchers.


Is it fair to say that Parler is kind of the opposite of Facebook, when it comes to the walled garden countermeasures to archiving? Or have I got it all wrong? But you know, Parler? Yeah, I mean, let me set it up this way, the, the archive of Parler. And I know like listeners are really interested in Parler. I think like our most popular podcast episode, was the one about Parler.


When it gets real quick to say that. I think the interesting the idea that there is the archive of Parler as someone who's collecting a lot of Parler data around the Capitol, like there isn't actually one archive of Parler there, like hundreds of archives of Parler.


Shawn Walker  37:23

Michael Simeone  37:31

Jessica Ogden  38:20

Michael Simeone  40:53

Shawn Walker  42:33

Jessica Ogden  43:09

Michael Simeone  45:37

Jessica Ogden  45:59

Michael Simeone  47:33

Shawn Walker  47:43

Jessica Ogden  48:39

Shawn Walker  51:01

Michael Simeone  52:23

Shawn Walker  52:37

Jessica Ogden  53:17

Michael Simeone  54:33

Shawn Walker  54:59

Jessica Ogden  55:05

Shawn Walker  55:46

Michael Simeone  55:51

