1

I have a dataset with key-value pairs that I want to import into R. The keys and values are separated by colons, while the key-value pairs are separated by commas. However, some of the values contain commas or colons, which can cause confusion when importing the data into R. To avoid this issue, I need to replace the commas and colons in the values with a different character before importing the data. For example:

{'AI': 'C3.ai, Inc.', 'BA': 'Boeing Company (The)', 'AAL': 'American Airlines Group, Inc.', 'MA': 'Mastercard :Incorporated'}

to

{'AI': 'C3.ai| Inc.', 'BA': 'Boeing Company (The)', 'AAL': 'American Airlines Group| Inc.', 'MA': 'Mastercard |Incorporated'}

I have tried this:

replacer<- function(x) {
  str_replace_all(x, "[,:]", "|")
}

clean_lines <- str_replace_all(lines, "(?<=')[^']*[:.][[:space:]]*[^']*[[:space:]]*[^']*(?=')", replacer)
cat(clean_lines)

which works fine for commas but messes up all colons, here is the result

{'AI': 'C3.ai| Inc.', 'BA': 'Boeing Company (The)', 'AAL': 'American Airlines Group| Inc.','MA': 'Mastercard :Incor| porated'}

how can i edit this code to replace only : within ' '

zx8754
  • 52,746
  • 12
  • 114
  • 209
  • 1
    The single quote format indicates you may be printing a Python `dict` and copy-pasting into R. If so, as @zx8754 answered, you can read this into R as JSON. You can also [write it as JSON from Python](https://stackoverflow.com/questions/17043860/how-to-dump-a-dict-to-a-json-file). – SamR Mar 31 '23 at 07:03

2 Answers2

1

This is a JSON format, so read it as such. First, to make it a valid format, we need to replace single quotes - ' to double - ", then read using jsonlite package:

library(jsonlite)

# example file
writeLines("{'AI': 'C3.ai, Inc.', 'BA': 'Boeing Company (The)', 'AAL': 'American Airlines Group, Inc.', 'MA': 'Mastercard :Incorporated'}", 
           "tmp.txt")

# read from file
x <- readLines("tmp.txt")

x <- gsub("'", "\"", x, fixed = TRUE)

fromJSON(x)
# $AI
# [1] "C3.ai, Inc."
# 
# $BA
# [1] "Boeing Company (The)"
# 
# $AAL
# [1] "American Airlines Group, Inc."
# 
# $MA
# [1] "Mastercard :Incorporated"
zx8754
  • 52,746
  • 12
  • 114
  • 209
  • I was actually able to do the replacement, but I find this answer interesting: could you modify your code to read from a text file?. assume your x is in a textfile called data – mutinda festus Mar 31 '23 at 07:14
0

You should use gsub function

example

a=c("'AI': 'C3.ai, Inc.', 'BA'")
b=gsub("',","';",a)
c=gsub(",","|",b)
a=gsub(";",",",c)
> a
[1] "'AI': 'C3.ai| Inc.', 'BA'"
VincentP
  • 89
  • 10