3

I have a data.frame of house sale transactions. At the moment the Address is in format.

  1 Accacia Avenue,This Town,This City,A10 1AA.

Is there a way I can split this into different columns in the data.frame removing the , at the same time?

I have created a separate vector for now just containing addresses.

The Address in stored in the Address column of the Dataframe - data.

head(data$Address)
[1] 22 Amesbury Road, Feltham (TW13 5HJ)
[2]Flat 11, Gloucester Court, Links Road, London (W3 0EW)

I need to split this into

"Address1", "Address2", "Address3"

and I also need to remove the postcode within () as I already have this in a separate field.

zx8754
  • 52,746
  • 12
  • 114
  • 209
  • welcome to SO. You should provide a reproducible example , show what you hacve so far tried. Please read [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – agstudy Jun 15 '15 at 09:25
  • How was the data imported? Why not use `read.csv()`? – zx8754 Jun 15 '15 at 10:29

4 Answers4

3
library(tidyr)
df <- data.frame(address = c("1 Accacia Avenue,This Town,This City,A10 1AA"))
separate(df , address , c("country" , "town" , "city" , "street") , ",")



# country             town      city      street
# 1 Accacia Avenue This Town This City   A10 1AA

these are dummy columns you can change column names as per your address

Nader Hisham
  • 5,214
  • 4
  • 19
  • 35
3

you can also try using splitstackshape

# using df from Nader Hisham's answer

library(splitstackshape)
out = setnames(cSplit(df, "address", ","), c("country", "town", "city", "street"))

#> out
#            country      town      city  street
#1: 1 Accacia Avenue This Town This City A10 1AA
Veerendra Gadekar
  • 4,452
  • 19
  • 24
  • 4
    This could be easily done with `read.table` as well. `read.table(text= as.character(df$address), sep=',', stringsAsFactors=FALSE)` – akrun Jun 15 '15 at 10:14
1

For your example,

a  <- c("1 Accacia Avenue,This Town,This City,A10 1AA.")

using strsplit() as follows,

strsplit(a,",")

this gives a list which you can unlist() if you want.

[[1]]
[1] "1 Accacia Avenue" "This Town"        "This City"       
[4] "A10 1AA."

unlist(strsplit(a,","))

[1] "1 Accacia Avenue" "This Town"        "This City"       
[4] "A10 1AA."  

Of course, since you have a data.frame() you will need to use lapply or sapply. If you provide a reproducible example we can show you how the apply functions can give you what you want.

DarrenRhodes
  • 1,431
  • 2
  • 15
  • 29
0

This may not be the best way to accomplish what you need (for several reasons), but it should be fairly easy to understand. This assumes your dataframe is named df, and the address strings are in a variable called Address.

First make sure your Addresses are stored as character strings: if class(df$Address) returns "factor" instead of "character", you can convert them to character strings with

df$Address <- as.character(df$Address)

Split the addresses on the comma character:

df$splitAdd <- strsplit(df$Address, "\\s*,\\s*")

(note this assumes there may or may not be spaces around the commas that you want to drop; if there are definitely no spaces, then you can replace "\\s*,\\s*" with just ",".)

Then put each of the resulting fields into a new variable, by applying to each element of the vector df$splitAdd a function that extracts the element at the appropriate index:

df$streetAdd <- unlist(lapply(df$splitAdd, function(x){x[1]}))
df$town <- unlist(lapply(df$splitAdd, function(x){x[2]}))
df$city <- unlist(lapply(df$splitAdd, function(x){x[3]}))
df$code <- unlist(lapply(df$splitAdd, function(x){x[4]}))

(You can actually probably do without the unlist for what you've described, but it can sometimes be more convenient.)

If you want to drop the splitAdd column from the dataframe:

df$splitAdd <- NULL
tegancp
  • 1,204
  • 6
  • 13