1

I am having trouble splitting columns in my data frame:

2010 census    2014 land area            city
8175133        302.6 sq mi 783.8 km2     New york
3792621        468.7 sq mi 1213.9 km2    Los Angeles
2695598        227.7 sq mi 589.6 km2     Chicago

I want to get:

2010 census    area sq/mi     area sq/km          city
8175133        302.6          783.8               New york
3792621        468.7          1213.9              Los Angeles
2695598        227.7          589.6               Chicago
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Clinton Woods
  • 249
  • 1
  • 2
  • 11
  • Please read [How to make a great reproducible example in R?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – M-- Jun 30 '17 at 20:01
  • *"I am having trouble"* ... what trouble? If you show the code you've tried and the errors or output you are getting, that will help. (Please read the link Masoud provided, it is very helpful and will increase the likelihood of getting an answer, sooner.) – r2evans Jun 30 '17 at 20:16
  • another base R: `with(df, cbind(df, do.call(rbind, regmatches(land.area, gregexpr("[0-9]+\\.*[0-9]+", land.area) ))))` – user20650 Jun 30 '17 at 20:40

4 Answers4

2

Here is a tidyr solution.

library(tidyr)

df <- data.frame(census = c(8175133, 3792621, 2695598),
                 land.area = c("302.6 sq mi 783.8 km2", "468.7 sq mi 1213.9 km2", "227.7 sq mi 589.6 km2"),
                 city = c("New york","Los Angeles","Chicago"), stringsAsFactors = FALSE)

df$land.area <- sapply(df$land.area, sub, pattern = " km2", replacement = "")

df <- df %>% separate(col = land.area, into = c("area sq/mi", "area sq/km"), sep = " sq mi ")
Matt Jewett
  • 3,249
  • 1
  • 14
  • 21
1

You can use stringr::str_split_fixed:

 library(stringr)
 splitted <- str_split_fixed(dt$X2014.land.area, " sq mi ", 2)

 splitted[,2] <- gsub( " km2", "", as.character(splitted[,2]))

 colnames(splitted) <- c("area sq. mi", "area sq km")

 splitted <- data.frame(splitted)


 dt.2 <- cbind(dt[,c(1,3)], splitted)

 dt.2 

#   X2010.census        city area.sq..mi area.sq.km 
# 1      8175133    New york       302.6      783.8 
# 2      3792621 Los Angeles       468.7     1213.9 
# 3      2695598     Chicago       227.7      589.6

Data

structure(list(X2010.census = c(8175133L, 3792621L, 2695598L), 
   X2014.land.area = c("302.6 sq mi 783.8 km2", "468.7 sq mi 1213.9 km2", 
   "227.7 sq mi 589.6 km2"), city = c("New york", "Los Angeles", 
   "Chicago")), .Names = c("X2010.census", "X2014.land.area", 
   "city"), row.names = c(NA, -3L), class = "data.frame") -> dt
M--
  • 25,431
  • 8
  • 61
  • 93
1

Using data.table and the tstrsplit function from that package:

dat <- fread("2010 census, 2014 land area, city
              8175133, 302.6 sq mi 783.8 km2, New york
              3792621, 468.7 sq mi 1213.9 km2, Los Angeles
              2695598, 227.7 sq mi 589.6 km2, Chicago")
dat[, c("area sq/mi", "area sq/km") := tstrsplit(`2014 land area`, " ", keep = c(1,4))]
dat[, .(`2010 census`, `area sq/mi`, `area sq/km`, city)]

#    2010 census area sq/mi area sq/km        city
# 1:     8175133      302.6      783.8    New york
# 2:     3792621      468.7     1213.9 Los Angeles
# 3:     2695598      227.7      589.6     Chicago
Eric Watt
  • 3,180
  • 9
  • 21
0

Here is a Base R solution using sub

dt$Area_SqMi = sub("\\s*sq\\s*mi.*", "", dt$X2014.land.area)
dt$Area_km2 = sub(".*mi\\s+(\\S+)\\s+km2.*", "\\1", dt$X2014.land.area)

dt
  X2010.census        X2014.land.area        city Area_SqMi Area_km2
1      8175133  302.6 sq mi 783.8 km2    New york     302.6    783.8
2      3792621 468.7 sq mi 1213.9 km2 Los Angeles     468.7   1213.9
3      2695598  227.7 sq mi 589.6 km2     Chicago     227.7    589.6

Of course, if you want to get rid of the original column, you can add dt = dt[,-2]

G5W
  • 36,531
  • 10
  • 47
  • 80