0

I have a problem with my R code. At first I have a dataframe (df) with one column which consists of numerical values as well as vectors. These vectors also contain numerical values. This is an example of some rows of the dataframe:

1. 60011000
2. 60523000
4. 60490000
5. 60599000
6. c("60741000", "60740000", "60742000")
7. 60647000
8. c("60766000", "60767000")
9. c("60563000", "60652000")

In the list you can see there are some rows (6, 8 & 9) containing vector elements. I want to concatenate the elements in the vectors to only one element. For example the result from the vector of line 6 should look like this:

607410006074000060742000

And the result of line 8 should look like this

6076600060767000

My dataframe has more than 30,000 rows so it is impossible for me to do it manually.

Can you help me to solve my problem? It is important that the number of rows does not change. Thank you very much and please excuse mistakes i made. I am not a native speaker.

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
  • 2
    Welcome to StackOverflow. Please take a look at these tips on how to produce a [minimum, complete, and verifiable example](http://stackoverflow.com/help/mcve), as well as this post on [creating a great example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Perhaps the following tips on [asking a good question](http://stackoverflow.com/help/how-to-ask) may also be worth a read. – lmo Jan 25 '17 at 13:25
  • 1
    Hi CM2893, what is the structure of that column? Are those rows characters? I imagine they must be! I don't think you have vectors, I think you have text that reads literally 'c("60741000", "60740000", "60742000")'. – Joy Jan 25 '17 at 13:30
  • Hi Joy, i used the following code to output the structure: sapply( df, class) and the result was list. So your are right that are not vectors. @Joy –  Jan 25 '17 at 13:40

3 Answers3

1

The data:

dat <- read.table(text='60011000
60523000
60490000
60599000
c("60741000", "60740000", "60742000")
60647000
c("60766000", "60767000")
c("60563000", "60652000")', sep = "\t")

dat
#                                V1
# 1                        60011000
# 2                        60523000
# 3                        60490000
# 4                        60599000
# 5 c(60741000, 60740000, 60742000)
# 6                        60647000
# 7           c(60766000, 60767000)
# 8           c(60563000, 60652000)

You can use gsub to replace all non-digit characters with the empty string.

dat$V1 <- gsub("[^0-9]+", "", dat$V1)

dat
#                         V1
# 1                 60011000
# 2                 60523000
# 3                 60490000
# 4                 60599000
# 5 607410006074000060742000
# 6                 60647000
# 7         6076600060767000
# 8         6056300060652000
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
0

You could do:

df=data.frame(a=c(1,2,3,4,'c("60741000", "60740000", "60742000")'),
              b=c(1,2,3,4,5),
              stringsAsFactors = F)
> df
                                      a b
1                                     1 1
2                                     2 2
3                                     3 3
4                                     4 4
5 c("60741000", "60740000", "60742000") 5
df[,"a"]=sapply(df[,"a"],function(x) paste(eval(parse(text=x)),collapse = ""))
> df
                         a b
1                        1 1
2                        2 2
3                        3 3
4                        4 4
5 607410006074000060742000 5
Haboryme
  • 4,611
  • 2
  • 18
  • 21
0

Here you go; (looks like someone beat me to the punch )

df <- read.table("df.txt",header=F,)
df
# V1
# 1              123
# 2               12
# 3  c("1","55","6")
# 4              356
# 5 c("99","55","3")
df[,1] <- as.numeric(as.character(gsub("[^0-9]","",df[,1])))
df
# V1
# 1   123
# 2    12
# 3  1556
# 4   356
# 5 99553
Mandar
  • 1,659
  • 1
  • 10
  • 14