0

I am new to R and have been given a homework to do some basic analysis on a set of data, namely IPO data and the effect of Covid-19 on it. Of course, I have not had any classes in R, so this kind of a "throw him in the pool so he learns to swim" situation.

So I looked up some tutorials, but I just can't figure this out: I only have to compare the first half of each year (2017-2020), but I just don't know how I can eliminate everything that is not within that time from my dataframe. (Namely, IPOs in August through December.) How would I do that.

This is the head() of my dataset, at least what R is giving me back

    # A tibble: 3 x 16
  ExchangeName CompanyName CurrencyCode ListingVenue ListingDate  Year Month `Domestic/Forei~ `Sector of Acti~
  <chr>        <chr>       <chr>        <chr>        <chr>       <dbl> <chr> <chr>            <chr>           
1 Hong Kong E~ SH Group (~ HKD          MAIN         2017-01-03   2017 Jan   Domestic         Other           
2 Shanghai St~ Central Ch~ CNY          Shanghai St~ 2017-01-03   2017 Jan   Domestic         other           
3 Shanghai St~ Zhejiang H~ CNY          Shanghai St~ 2017-01-03   2017 Jan   Domestic         other           
# ... with 7 more variables: `ISIN/CUSIP/Other` <chr>, Region <chr>, `Country of Incorporation` <chr>, `Market
#   Capitalisation on 1st trading day` <chr>, `Capital raised through IPO (Newly issued shares)` <chr>, `Capital raised
#   through IPO (Already issued shares)` <chr>, `Capital raised through IPO (Total)` <chr>

thanks for any help in advance. I am really lost on this

mn2609
  • 53
  • 8
  • obviously i mean july through december, sorry for the mixup – mn2609 Nov 15 '20 at 18:23
  • Exactly what is the question? Are you trying to extract all January-June rows into one data frame and all July to December rows into a second data frame? If that is the question then `is_first_half <- format(as.Date(DF$ListingDate), "%m") <= "06"; DF[is_first_half, ]; DF[!is_first_half, ]` – G. Grothendieck Nov 15 '20 at 18:38

1 Answers1

1

First, I'd transform your ListingDate into a POSIXct object (this is just a way to store dates and times in their own format instead of text) and select the first semester. Using the package ludibriate, you can do this (I got inspiration from this place

data <- data[semester(as.POSIXct(data$ListingDate, format = "%Y-%m-%d")) == 1,]

Note that what I did was selecting the rows that were identified by semester() (this is the part from ludibriate) as being in the first semester, after having converted your character string into a date with as.POSIXct. Every part of this could have been done diferently; for instance, you could use ymd() instead of as.POSIXct(..., format = "%Y-%m-%d"). Actually, I'm pretty sure it's better to use ymd(), which is also fromludibriate, just because it's simpler. I just didn't put in my main answer because I'm not used to it, but if you want to, do it:

data <- data[semester(ymd(data$ListingDate)) == 1,]
Érico Patto
  • 1,015
  • 4
  • 18