0

I have several tables of data that include different sorts of non-UTF8 characters. I would like to know in which tables and which columns these characters exist. So I have a large list containing lists of data. I am trying to write a loop but it does not seem to work. Here I am basically saying if any of these unknown characters exist in my data, filter them out.

  for (table in names(tables)) {               #this gives the table names
    for (item in names(tables[[table]])){        #this gives the variables within each table
      if(grepl("[^\x01-\x7F]", tables[[table]][[item]])){filter(grepl("[^\x01-\x7F]", tables[[table_name]][[name]]))}
  }}

Error: the condition has length > 1

Can you advise me on it please?

Rara
  • 105
  • 9
  • What exactly do you mean by non-UTF8? Do you mean non-ASCII? You seem to only be grepping in the ASCII range. Or do you mean other encodings like Latin-1? It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick May 11 '23 at 14:10
  • Honestly, I am not sure what exactly I have in my data. I asked for advice before, could you look at my previous question please? https://stackoverflow.com/questions/76208766/filtering-special-characters-that-seem-to-belong-to-different-encoding-systems – Rara May 11 '23 at 14:31
  • As I commented on that last question, that doesn't look like test data. Where are you accessing this data from? You need to know what encoding was used on the data in order to read it correctly. That's not something you can easily guess from the data itself. You need to get the encoding correct at the time when you import the data. It's very hard to fix after the fact. – MrFlick May 11 '23 at 14:38
  • The data were collected long time ago and stored in a data repository then. These characters have appeared since they were transferred in this repository and unfortunately the original data are not available any longer. My task is to provide the project manager with a complete list of where exactly these characters exist in the data so that he decides what to do. No more information, unfortunately. – Rara May 11 '23 at 14:47

0 Answers0