0

In R: I have a dataframe of many rows but only one column. Each row has a long string of characters, periodically punctuated with a | mark. I want to split the characters every time there is a | mark, so that there are many columns.

1995-01-01|33.399999999999999|40.299999999999997|35.399999999999999|35.0|37.200000000000003|23.399999999999999|23.199999999999999|47.399999999999999|49.200000000000003|49.200000000000003|48.100000000000001|42.299999999999997|58.200000000000003|17.399999999999999|50.700000000000003|5.2999999999999998|20.600000000000001|38.5|43.299999999999997 etc.

Each string begins with a date and then has numbers corresponding to cities. The variable names are also listed as one string, and they need to be separated by the "." mark.

date.abilene_tx.akron_oh.albany_ny.albuquerque_nm.allentown_pa.amarillo_tx.anchorage_ak.asheville_nc.atlanta_ga etc.

Any help much appreciated!

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485

2 Answers2

1

Here's a data.frame with one column and 10 rows that might be similar to yours:

dat <- "1995-01-01|33.399999999999999|40.299999999999997|35.399999999999999|35.0|37.200000000000003|23.399999999999999|23.199999999999999|47.399999999999999|49.200000000000003|49.200000000000003|48.100000000000001|42.299999999999997|58.200000000000003|17.399999999999999|50.700000000000003|5.2999999999999998|20.600000000000001|38.5|43.299999999999997 "

df <- data.frame(col1 = rep(dat, 10))

And here's the data.frame that has new columns based on splitting Col1 one:

foo <- data.frame(do.call('rbind', strsplit(as.character(df$col1),'|',fixed=TRUE)))
foo

           X1                 X2                 X3                 X4   X5                 X6
1  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
2  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
3  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
4  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
5  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
6  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
7  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
8  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
9  1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
10 1995-01-01 33.399999999999999 40.299999999999997 35.399999999999999 35.0 37.200000000000003
                   X7                 X8                 X9                X10                X11
1  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
2  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
3  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
4  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
5  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
6  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
7  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
8  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
9  23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
10 23.399999999999999 23.199999999999999 47.399999999999999 49.200000000000003 49.200000000000003
                  X12                X13                X14                X15                X16
1  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
2  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
3  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
4  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
5  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
6  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
7  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
8  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
9  48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
10 48.100000000000001 42.299999999999997 58.200000000000003 17.399999999999999 50.700000000000003
                  X17                X18  X19                 X20
1  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
2  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
3  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
4  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
5  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
6  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
7  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
8  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
9  5.2999999999999998 20.600000000000001 38.5 43.299999999999997 
10 5.2999999999999998 20.600000000000001 38.5 43.299999999999997
Ben
  • 41,615
  • 18
  • 132
  • 227
  • Thanks for your help! But an error message comes up: non-character argument. Is that because it is a data frame? How do I get around this problem? – user3029410 Dec 05 '13 at 06:15
  • I've updated my answer to deal with a data frame – Ben Dec 05 '13 at 06:49
  • This is amazingly helpful. I'm new to R and am learning the ropes. Because my dataset has a few thousand rows with different values, I need to make a for loop (or a function) to repeat this step for all the rows? – user3029410 Dec 05 '13 at 07:07
  • Have you tried it? This function should work for your dataframe, assuming that the long strings in each row can be split into the same number of pieces (what is in those pieces doesn't matter, I just reproduced the first row for convenience). Why don't you accept this answer, then [ask another question](http://stackoverflow.com/questions/ask) that includes a sample of your data frame, (use `dput(head(mydata))` to get a sample to paste into your question). The more closely your question resembles your actual use-case, the more relevant the answers will be. – Ben Dec 05 '13 at 08:02
1

You should have loaded the data from file with this command:

 dat <- read.table(filename, sep="|")

This will handle the lines separated with "|" but you then say "strings" are separated with ".", so if these are somehow mixed in htat text file you may need to do some preprocessing with input first with readLines().

IRTFM
  • 258,963
  • 21
  • 364
  • 487